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OPINIONBY: NEWMAN 

OPINION: NEWMAN, Circuit Judge. 

Both of the parties to a patent interference proceeding have appealed the decision of the Board of Patent Appeals and 
Interferences of the United States Patent and Trademark Office, wherein the Board held that the specification of neither 
party met the written description requirement of the patent statute. Capon v. Eshhar, Interf. No. 103,887 (Bd. Pat. App. & 
Interf. Mar. 26, 2003). The Board dissolved the interference and cancelled all 
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[*2] of the claims of both parties corresponding to the interference count. With this ruling, the Board terminated the 
proceeding and did not reach the question of priority of invention. We conclude that the Board erred in its application of 
the law of written description. The decision is vacated and the case is remanded to the Board for further proceedings. 

BACKGROUND 

Daniel J. Capon, Arthur Weiss, Brian A. Irving, Margo R. Roberts, and Krisztina Zsebo (collectively "Capon") and 
Zelig Eshhar, Daniel Schindler, Tova Waks, and Gideon Gross (collectively "Eshhar") were the parties to an interference 
proceeding between Capon's United States Patent No. 6.407,221 ("the '227 patent") entitled "Chimeric Chains for 
Receptor-Associated Signal Transduction Pathways" and Eshhar's patent application Serial No. 08/084,994 ("the '994 
application") entitled "Chimeric Receptor Genes and Cells Transformed Therewith." Capon^s Patent No. 5,359.046 ("the 
*046 patent"), parent of the '227 patent, was also included in the interference but was held expired for non-payment of a 
maintenance fee. The PTO included the '046 patent in its decision and in its argument of this appeal, nl 

nl Although Capon is designated as appellant and Eshhar as cross-appellant, both appealed the Board's decision. 
See Fed R. App. P. 28(h), The Director of the PTO intervened to support the Board, and has fully participated in 
this appeal. 
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[*3] 

A patent interference is an administrative proceeding pursuant to 35 U.S,C §§ 102(g) and 135(a), conducted for the 
purpose of determining which of competing applicants is the first inventor of common subject matter. An interference 
is instituted after the separate patent applications have been examined and found to contain patentable subject matter. 
Capon's patents had been examined and had issued before this interference was instituted, and Eshhar's application had 
been examined and allowed but a patent had not yet issued. 

During an interference proceeding the Board is authorized to determine not only priority of invention but also to 
redetermine patentability. 35 U.S.C. § 6(b). The question of patentability of the claims of both parties was raised sua 
sponte by an administrative patent judge during the preliminary proceedings. Thereafter the Board conducted an inter 
partes proceeding limited to this question, receiving evidence and argument. The Board then invalidated all of the claims 
that had been designated as corresponding to the count of the interference, viz., all of the claims of the Capon '227 patent, 
claims 5-8 of 
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[*4] the Capon W6 patent, and claims 1-7, 9-20, and 23 of the Eshhar '994 application. 

In accordance with the Administrative Procedure Act, the law as interpreted and applied by the agency receives plenary 
review on appeal, and the agency's factual findings are reviewed to determine whether they were arbitrary, capricious, or 
unsupported by substantial evidence in the administrative record. See 5 U.S.Q § 706(2); Dickinson v. Zurko, 527 US. 
150. 164-65, 144 L Ed 2d 143, 119 S. Ct 1816 (1999); In re Gartside, 203 FJd 1305, 1315 (Fed Cir. 2000). 

The Invention 

A chimeric gene is an artificial gene that combines segments of DNA in a way that does not occur in nature. The '227 
patent and '994 application are directed to the production of chimeric genes designed to enhance the immune response by 
providing cells with specific cell-surface antibodies in a form that can penetrate diseased sites, such as solid tumors, that 
were not previously reachable. The parties explain that their invention is a way of endowing immune cells with antibody- 
type specificity, by combining known antigen-binding-domain producing DNA and known lymphocyte-receptor-protein 
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[*5] producing DNA into a unitary gene that can express a unitary polypeptide chain. Eshhar summarized the problem to 
which the invention is directed: 

Antigen-specific effector lymphocytes, such as tumor-specific T cells, are very rare, individual-specific, 
limited in their recognition spectrum and difficult to obtain against most malignancies. Antibodies, on the 
other hand, are readily obtainable, more easily derived, have wider spectrum and are not individual-specific. 
The major problem of applying specific antibodies for cancer immunotherapy lies in the inability of sufficient 
amounts of monoclonal antibodies (mAb) to reach large areas within solid tumors. 

Technical Paper Explaining Eshhar's Invention, at 6. 

The inventions of Capon and Eshhar are the chimeric DNA that encodes single-chain chimeric proteins for expression 
on the surface of cells of the immune system, plus expression vectors and cells transformed by the chimeric DNA. 
The experts for both parties explain that the invention combines selected DNA segments that are both endogenous and 
nonendogenous to a cell of the immune system, whereby the nonendogenous segment encodes the single-chain variable 
("scFv") 
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[*6] domain of an antibody, and the endogenous segment encodes cytoplasmic, transmembrane, and extracellular 
domains of a lymphocyte signaling protein. They explain that the scFv domain combines the heavy and light variable 
("Fv") domains of a natural antibody, and thus has the same specificity as a natural antibody. Linking this single chain 
domain to a lymphocyte signaling protein creates a chimeric scFv-receptor ("scFvR") gene which, upon transfection into 
a cell of the immune system, combines the specificity of an antibody with the tissue penetration, cytokine production, and 
target-cell destruction capability of a lymphocyte. 

The parties point to the therapeutic potential if tumors can be infiltrated with specifically designed immune cells of 
appropriate anti-tumor specificity. 

The Eshhar Claims 

The Board held unpatentable the following claims of Eshhar's '994 application; these were all of the '994 claims that 
had been designated as corresponding to the count of the interference. Eshhar's claim 1 was the designated count. 

1 . A chimeric gene comprising 

a first gene segment encoding a single-chain Fv domain (scFv) of a specific antibody and 
a second gene segment 
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[*7] encoding partially or entirely the transmembrane and cytoplasmic, and optionally the extracellular, 
domains of an endogenous protein 

wherein said endogenous protein is expressed on the surface of cells of the immune system and triggers 
activation and/or proliferation of said cells, 

which chimeric gene, upon transfection to said cells of the immune system, expresses said scFv domain 
and said domains of said endogenous protein in one single chain on the surface of the transfected cells such 
that the transfected cells are triggered to activate and/or proliferate and have MHC nonrestricted antibody- 
type specificity when said expressed scFV domain binds to its antigen. 

2. A chimeric gene according to claim 1 wherein the second gene segment further comprises partially or 
entirely the extracellular domain of said endogenous protein. 

3. A chimeric gene according to claim I wherein the first gene segment encodes the scFv domain of an 
antibody against tumor cells. 

4. A chimeric gene according to claim 1 wherein the first gene segment encodes the scFv domain of an 
antibody against virus infected cells. 

5. A chimeric gene according to claim 4 wherein the virus is HIV. 



6. 
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[*8] A chimeric gene according to claim 1 wherein the second gene segment encodes a lymphocyte receptor 
chain. 

7. A chimeric gene according to claim 6 wherein the second gene segment encodes a chain of the T cell 
receptor. 

9. A chimeric gene according to claim 7 wherein the second gene segment encodes the a, B, y, or o chain of 
the antigen-specific T cell receptor. 

10. A chimeric gene according to claim 1 wherein the second gene segment encodes a polypeptide of the 
TCR/CD3 complex. 

1 1 . A chimeric gene according to claim 10 wherein the second gene segment encodes the zeta or eta isoform 
chain. 

12. A chimeric gene according to claim 1 wherein the second gene segment encodes a subunit of the Fc 
receptor or IL-2 receptor. 

13. A chimeric gene according to claim 12 wherein the second gene segment encodes a common subunit of 
IgE and IgG binding Fc receptors. 

14. A chimeric gene according to claim 13 wherein said subunit is the gamma subunit. 

15. A chimeric gene according to claim 13 wherein the second gene segment encodes the CD 16a chain of the 
FcyRlII or FcyRlI. 

16. A chimeric gene according to claim 12 wherein the second gene segment encodes the a 
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[*9] or B subunit of the IL-2 receptor. 

17. An expression vector comprising a chimeric gene according to claim 1. 

18. A cell of the immune system endowed with antibody specificity transformed with an expression vector 
according to claim 17. 

19. A cell of the immune system endowed with antibody specificity comprising a chimeric gene according to 
claim 1 . 

20. A cell if the immune system according to claim 19 selected from the group consisting of a natural killer 
cell, a lymphokine activated killer cell, a cytotoxic T cell, a helper T cell and a subtype thereof. 

23. A chimeric gene according to claim 1 wherein said endogenous protein is a lymphocyte receptor chain, a 
polypeptide of the TCR/CD3 complex, or a subunit of the Fc or IL-2 receptor. 

The Board did not discuss the claims separately, and held that the specification failed to satisfy the written description 
requirement as to all of these claims. 

The Capon Claims 

Claims 1-10, all of the claims of the '221 patent, were held unpatentable on written description grounds. Claims 1-6 
are directed to the chimeric DNA, claims 7, 8, and 10 to the corresponding cell comprising the DNA, and claim 9 to 



2005 U.S. App. LEXIS 16865, *10 



[*10] the chimeric protein: 



L A chimeric DNA encoding a membrane bound protein, said chimeric DNA comprising in reading frame: 
DNA encoding a signal sequence which directs said membrane bound protein to the surface membrane; 

DNA encoding a non-MHC restricted extracellular binding domain which is obtained from a single chain 
antibody that binds specifically to at least one ligand, wherein said at least one ligand is a protein on the 
surface of a cell or a viral protein; 

DNA encoding a transmembrane domain which is obtained from a protein selected from the group 
consisting of CD4, CDS, immunoglobulin, the CD3 zeta chain, the CD3 gamma chain, the CD3 delta chain 
and the CDS epsilon chain; and 

DNA encoding a cytoplasmic signal-transducing domain of a protein that activates an intracellular 
messenger system which is obtained from CD3 zeta, 

wherein said extracellular domain and said cytoplasmic domain are not naturally joined together, and said 
cytoplasmic domain is not naturally joined to an extracellular ligand-binding domain, and when said chimeric 
DNA is expressed as a membrane bound protein in a host cell under conditions suitable for expression, said 
membrane bound protein 
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[*1 1] initiates signaling in said host cell when said extracellular domain binds said at least one ligand. 

2. The DNA of claim 1, wherein said single-chain antibody recognizes an antigen selected from the group 
consisting of viral antigens and tumor cell associated antigens. 

3. The DNA of claim 2 wherein said single-chain antibody is specific for the HIV env glycoprotein. 

4. The DNA of claim 1, wherein said transmembrane domain is naturally joined to said cytoplasmic domain. 

5. An expression cassette comprising a transcriptional initiation region, the DNA of claim 1 under the 
transcriptional control of said transcriptional initiation region, and a transcriptional termination region. 

6. A retroviral RNA or DNA construct comprising the expression cassette of claim 5. 

7. A cell comprising the DNA of claim 1. 

8. The cell of claim 7, wherein said cell is a human cell. 

9. A chimeric protein comprising in the N-terminal to C-terminal direction: 

a non-MHC restricted extracellular binding domain which is obtained from a single chain antibody that 
binds specifically to at least one ligand, wherein said at least one ligand is a protein on the surface of a cell 
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[*12] or a viral protein; 

a transmembrane domain which is obtained from a protein selected from the group consisting CD4, CDS, 
immunoglobulin, the CD3 zeta chain, the CDS gamma chain, the CDS delta chain and the CDS epsilon chain; 
and 

a cytoplasmic signal-transducing domain of a protein that activates an intracellular messenger system 
which is obtained from CDS zeta, 

wherein said extracellular domain and said cytoplasmic domain are not naturally joined together, and 
said cytoplasmic domain is not naturally joined to an extracellular ligand-binding domain, and when said 
chimeric protein is expressed as a membrane bound protein in a host cell under conditions suitable for 
expression, said membrane bound protein initiates signaling in said host cell when said extracellular domain 
binds said at least one ligand. 

10. A mammalian cell comprising as a surface membrane protein, the protein of claim 9. 

In addition, claims 5, 6, 7, and 8 of Capon's '046 patent were held unpatentable. These claims are directed to chimeric 
DNA sequences where the encoded extracellular domain is a single-chain antibody containing ligand binding activity. 

The Board Decision 
The Board presumed 
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[* 13] enablement by the specifications of the '221 patent and *994 application of the full scope of their claims, and based 
its decision solely on the ground of failure of written description. The Board held that neither party's specification provides 
the requisite description of the fiill scope of the chimeric DNA or encoded proteins, by reference to knowledge in the art 
of the "structure, formula, chemical name, or physical properties" of the DNA or the proteins. In the Board's words: 

We are led by controlling precedent to understand that the full scope of novel chimeric DNA the parties claim 
is not described in their specifications under 35 U.S.C § 112, first paragraph, by reference to contemporary 
and/or prior knowledge in the art of the structure, formula, chemical name, or physical properties of many 
protein domains, and/or DNA sequences which encode many protein domains, which comprise single-chain 
proteins and/or DNA constructs made in accordance with the plans, schemes, and examples thereof the 
parties disclose. 



Bd. op. at 4. As controlling precedent the Board cited Regents of the University of California v. Eli Lilly & Co., 119 
1559 (Fed. Cir 1997)\ 
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[* 14] Fiers v. Revel, 984 K2d 1 1 64 (Fed Cir 1993); Amgen, Inc. v. Chugai Pharmaceutical Co., 927 F2d 1200 (Fed. Cir 
1991); and Enzo Biochem, Inc. v. Gen-Probe, Inc, 296 F3d 1316 (Fed, Cir 2002), The Board summarized its holding as 
follows: 

Here, both Eshhar and Capon claim novel genetic material described in terms of the functional characteristics 
of the protein it encodes. Their specifications do not satisfy the written description requirement because 
persons having ordinary skill in the art would not have been able to visualize and recognize the identity 
of the claimed genetic material without considering additional knowledge in the art, performing additional 
experimentation, and testing to confirm results. 

Bd. op. at 89. 
DISCUSSION 

Eshhar and Capon challenge both the Board's interpretation of precedent and the Board's ruling that their descriptions 
are inadequate. Both parties explain that their chimeric genes are produced by selecting and combining known heavy-and 
light-chain immune-related DNA segments, using known DNA-linking procedures. The specifications of both parties 
describe procedures for identifying 
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[*15] and obtaining the desired immune-related DNA segments and linking them into the desired chimeric genes. Both 
parties point to their specific examples of chimeric DNA prepared using identified known procedures, along with citation 
to the scientific literature as to every step of the preparative method. 

The parties presented expert witnesses who placed the invention in the context of prior knowledge and explained how 
the descriptive text would be understood by persons of skill in the field of the invention. The witnesses explained that the 
principle of forming chimeric genes fi-om selected segments of DNA was known, as well as their methods of identifying, 
selecting, and combining the desired segments of DNA. Dr. Eshhar presented an expert statement wherein he explained 
that the prior art contains extensive knowledge of the nucleotide structure of the various immune-related segments of 
DNA; he stated that over 785 mouse antibody DNA light chains and 1,327 mouse antibody DNA heavy chains were 
known and published as early as 1991. Similarly Capon's expert Dr. Desiderio discussed the prior art, also citing scientific 
literature: 

The linker sequences disclosed in the '227 patent 
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[*16] (col. 24, lines 4 and 43) used to artificially join a heavy and light chain nucleic acid sequence and 
pennit functional association of the two ligand binding regions were published by 1990, as were the methods 
for obtaining the mature sequences of the desired heavy and light chains for constructing a SAb (Exhibit 47, 
Batra et al., J., Biol. Chem., 1990; Exhibit 48, Bird et aL, Science, 1988; Exhibit 50, Huston et al., PNAS, 
1988; Exhibit 51, Chaudhary, PNAS, 1990, Exhibit 56, Morrison et aL, Science, 1985; Exhibit 53, Sharon et 
al.. Nature 1984). 

Desiderio declaration at 4 PI 1. 

Both parties stated that persons experienced in this field would readily know the structure of a chimeric gene made 
of a first segment of DNA encoding the single-chain variable region of an antibody, and a second segment of DNA 
encoding an endogenous protein. They testified that re-analysis to confirm these structures would not be needed in order 
to know the DNA structure of the chimeric gene, and that the Board's requirement that the specification must reproduce 
the "structure, formula, chemical name, or physical properties" of these DNA combinations had been overtaken by the 
state of the science. 
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[*17] They stated that where the structure and properties of the DNA components were known, reanalysis was not 
required. 

Eshhar's specification contains the nucleotide sequences of sixteen different receptor primers and four different scFv 
primers fi'om which chimeric genes encoding scFvR may be obtained, while Capon's specification cites literature sources 
of such information. Eshhar's specification shows the production of chimeric genes encoding scFvR using primers, as 
listed in Eshhar's Table I. Capon stated that natural genes are isolated and joined using conventional methods, such as 
the polymerase chain reaction or cloning by primer repair. Capon, like Eshhar, discussed various known procedures for 
identifying, obtaining, and linking DNA segments, accompanied by experimental examples. The Board did not dispute 
that persons in this field of science could determine the structure or formula of the linked DNA fi'om the known structure 
or formula of the components. 

The Board stated that "controlling precedent" required inclusion in the specification of the complete nucleotide 
sequence of "at least one" chimeric gene. Bd. op. at 4. The Board also objected that the claims were broader than 
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[* 1 8] the specific examples. Eshhar and Capon each responds by pointing to the scientific completeness and depth of their 
descriptive texts, as well as to their illustrative examples. The Board did not relate any of the claims, broad or narrow, to 
the examples, but invalidated all of the claims without analysis of their scope and the relation of claim scope to the details 
of the specifications. 

Eshhar and Capon both argue that they have set forth an invention whose scope is fully and fairly described, for the 
nucleotide sequences of the DNA in chimeric combination is readily understood to contain the nucleotide sequences of 
the DNA components. Eshhar points to the general and specific description in his specification of known immune-related 
DNA segments, including the examples of their linking. Capon points similarly to his description of selecting DNA 
segments that are known to express immune-related proteins, and stresses the existing knowledge of these segments and 
their nucleotide sequences, as well as the known procedures for selecting and combining DNA segments, as cited in the 
specification. 

Both parties argue that the Board misconstrued precedent, and that precedent does not establish 
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[*19] a per se rule requiring nucleotide-by-nucleotide re-analysis when the structure of the component DNA segments 
is already known, or readily determined by known procedures. 

The Statutory Requirement 

The required content of the patent specification is set forth in Section J 12 of Title 35: 

§ 112 PI. The specification shall contain a written description of the invention, and of the manner and 
process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in 
the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set 
forth the best mode contemplated by the inventor of carrying out his invention. 

The "written description" requirement implements the principle that a patent must describe the technology that is sought 
to be patented; the requirement serves both to satisfy the inventor's obligation to disclose the technologic knowledge upon 
which the patent is based, and to demonstrate that the patentee was in possession of the invention that is claimed. See 
Enzo Biochem, 296 F3d at 1330 (the written description requirement "is the quid pro quo 
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[*20] of the patent system; the public must receive meaningful disclosure in exchange for being excluded from practicing 
the invention for a limited period of time"); Reiffin v. Microsoft Corp,, 214 F3d 1342, 1345-46 (Fed Cir 2000) (the 
purpose of the written description requirement "is to ensure that the scope of the right to exclude ...does not overreach the 
scope of the inventor's contribution to the field of art as described in the patent specification"); In re Barker, 559 R2d 
588, 592 4 (CCPA 1977) (the goal of the written description requirement is "to clearly convey the information that 
an applicant has invented the subject matter which is claimed"). The written description requirement thus satisfies the 
policy premises of the law, whereby the inventor's technical/scientific advance is added to the body of knowledge, as 
consideration for the grant of patent exclusivity. 

The descriptive text needed to meet these requirements varies with the nature and scope of the invention at issue, and 
with the scientific and technologic knowledge already in existence. The law must be applied to each invention that enters 
the patent process, for each patented 
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[*21] advance is novel in relation to the state of the science. Since the law is applied to each invention in view of the state 
of relevant knowledge, its application will vary with differences in the state of knowledge in die field and differences in 
the predictability of the science. 

For the chimeric genes of the Capon and Eshhar inventions, the law must take cognizance of the scientific facts. The 
Board erred in refusing to consider the state of the scientific knowledge, as explamed by both parties, and in declining 
to consider the separate scope of each of the claims. None of the cases to which the Board attributes the requirement of 
total DNA re-analysis, i.e.. Regents v. Lilly, Fiers v. Revel, Amgen, or Enzo Biochem, require a re-description of what 
was already known. In Lilly, 119 R3d at 1567, the cDNA for human insulin had never been characterized. Similarly in 
Fiers, 984 R2d at 1171, much of the DNA sought to be claimed was of unknown structure, whereby this court viewed the 
breadth of the claims as embracing a "wish" or research "plan." In Amgen, 927 Eld at 1206, the court explained that a 
novel gene was 
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[*22] not adequately characterized by its biological function alone because such a description would represent a mere 
"wish to know the identity" of the novel material. \n Enzo Biochem, 296 E 3d at 1326, this court reaffirmed that deposit of 
a physical sample may replace words when description is beyond present scientific capability. In Amgen Inc. v. Hoechst 
Marion Roussel, Inc, 314 E3d 1313, 1332 (Fed. Cir 2003) the court explained further that the written description 
requirement may be satisfied "if in the knowledge of the art the disclosed function is sufficiently correlated to a particular, 
known structure." These evolving principles were applied in Noelle v. Lederman, 355 E3d 1343, 1349 (Fed. Cir. 2004), 
where the court affirmed that the human antibody there at issue was not adequately described by the structure and function 
of the mouse antigen; and in University of Rochester v. G.D. Searle & Co., 358 E3d 916, 925-26 (Fed. Cir. 2004), where 
the court affirmed that the description of the COX-2 enzyme did not serve to describe unknown compounds capable of 
selectively inhibiting the enzyme. 

The "written description" 
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[*23] requirement must be applied in the context of the particular invention and the state of the knowledge. The Board's 
rule that the nucleotide sequences of the chimeric genes must be fiilly presented, although the nucleotide sequences of the 
component DNA are known, is an inappropriate generalization. When the prior art includes the nucleotide information, 
precedent does not set a per se rule that the information must be determined afresh. Both parties state that a person 
experienced in the field of this invention would know that these known DNA segments would retain their DNA sequences 
when linked by known methods. Both parties explain that their invention is not in discovering which DNA segments are 
related to the immune response, for that is in the prior art, but in the novel combination of the DNA segments to achieve a 
novel result. 

The "written description" requirement states that the patentee must describe the invention; it does not state that every 
invention must be described in the same way. As each field evolves, the balance also evolves between what is known and 
what is added by each inventive contribution. Both Eshhar and Capon explain that this invention does not concern 
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[*24] the discovery of gene function or structure, as in Lilly. The chimeric genes here at issue are prepared from known 
DNA sequences of known function. The Board's requirement that these sequences must be analyzed and reported in 
the specification does not add descriptive substance. The Board erred in holding that the specifications do not meet the 
written description requirement because they do not reiterate the structure or formula or chemical name for the nucleotide 
sequences of the claimed chimeric genes. 

Claim Scope 

There remains the question of whether the specifications adequately support the breadth of all of the claims that are 
presented. The Director argues that it cannot be known whether all of the permutations and combinations covered by the 
claims will be effective for the intended purpose, and that the claims are too broad because they may include inoperative 
species. The inventors say that they have provided an adequate description and exemplification of their invention as would 
be understood by persons in the field of the invention. They state that biological properties typically vary, and that their 
specifications provide for evaluation of the effectiveness 
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[*25] of their chimeric combinations. 

It is well recognized that in the "unpredictable" fields of science, it is appropriate to recognize the variability in the 
science in determining the scope of the coverage to which the inventor is entitled. Such a decision usually focuses on 
the exemplification in the specification. See, e.g., Enzo Biochem, 296 E3d at 1327-28 (remanding for district court to 
determine "whether the disclosure provided by the three deposits in this case, coupled with the skill of the art, describes 
the genera of claims 1-3 and 5"); Lilly, 119 F.Sd at 1569 (genus not described where "a representative number of 
cDNAs, defined by nucleotide sequence, falling within the scope of the genus" had not been provided); In re Gosteli, 872 
E2d 1008, 1012 (Fed. Cir. 1989) (two chemical compounds were insufficient description of subgenus); In re Smith, 59 
aCRA. 1025, 458 E2d 1389, 1394-95 (CCPA 1972) (disclosure of genus and one species was not sufficient description 
of intermediate subgenus); In re Grimme, 47 CCRA, 785, 274 E2d949, 952, 1960 Dec. Comm'rPat. 123 (CCPA 1960) 
(disclosure of single example and 
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[*26] statement of scope sufficient disclosure of subgenus). 

Precedent illustrates that the determination of what is needed to support generic claims to biological subject matter 
depends on a variety of factors, such as the existing knowledge in the particular field, the extent and content of the prior 
art, the maturity of the science or technology, the predictability of the aspect at issue, and other considerations appropriate 
to the subject matter See, e.g.. In re Wallach, 378 F3d 1330, 1 333-34 (Fed Cin 2004) (an amino acid sequence supports 
"the entire genus of DNA sequences" that can encode the amino acid sequence because "the state of the art has developed" 
such that it is a routine matter to convert one to the other); University of Rochester, 358 E3d at 925 (considering whether 
the patent disclosed the compounds necessary to practice the claimed method, given the state of technology); Singh v. 
Brake, 317 E3d 1334, 1343, 48 Fed. Appx. 766 (Fed, Cir 2002) (affirming adequacy of disclosure by distinguishing 
precedent in which the selection of a particular species within the claimed genus had involved "highly unpredictable 
results"). 



It 
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[*27] is not necessary that every permutation within a generally operable invention be effective in order for an inventor to 
obtain a generic claim, provided that the effect is sufficiently demonstrated to characterize a generic invention. See In re 
AngstadU 537 Eld 498, 504 (CCPA 1976) ("The examples, both operative and inoperative, are the best guidance this art 
permits, as far as we can conclude from the record"). While the Board is correct that a generic invention requires adequate 
support, the sufficiency of the support must be determined in the particular case. Both Eshhar and Capon present not only 
general teachings of how to select and recombine the DNA, but also specific examples of the production of specified 
chimeric genes. For example, Eshhar points out that in Example 1 of his specification the FcR . chain was used, which 
chain was amplified from a human cDNA clone, using the procedure of Kuster, H. et al., J. Biol. Chem., 265:6448-6451 
(1990), which is cited in the specification and reports the complete sequence of the FcRy chain. Eshhar's Example 1 also 
explains the source of the genes that provide the heavy and light chains of the single chain antibody. 
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[*28] citing the PhD thesis of Gideon Gross, a co-inventor, which cites a reference providing the complete sequence of 
the Sp6 hght chain gene used to construct the single-chain antibody. Eshhar states that the structure of the Sp6 heavy 
chain antibody was well known to those of skill in the art and readily accessible on the internet in a database as entry 
EMBL: MMSP6718. Example 5 at page 54 of the Eshhar specification cites Ravetch et al., J, Exp. Med., 170:481-497 
(1989) for the method of producing the GDI 6 a DMA clone that was PGR amplified; this reference published the complete 
DNA sequence of the CD16 a chain, as discussed in paragraph 43 of the Eshhar Declaration. Example 3 of the Eshhar 
specification uses the DNA of the monoclonal anti-HER2 antibody and states that the N29 hybridoma that produces this 
antibody was deposited with the Gollection Nationale de Gultures de Microorganismes, Institut Pasteur, Paris, on August 
19, 1992, under Deposit No. GNGM 1-1262. It is incorrect to criticize the methods, examples, and referenced prior art of 
the Eshhar specification as but "a few PGR primers and probes," as does the Director's brief. 

Capon's Example 3 provides a detailed description 
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[*29] of the creation and expression of single chain antibody fused with T-cell receptor zeta chain, referring to published 
vectors and procedures. Capon, like Eshhar, describes gene segments and their ligation to form chimeric genes. Although 
Capon includes fewer specific examples in his specification than does Eshhar, both parties used standard systems of 
description and identification, as well as known procedures for selecting, isolating, and linking known DNA segments. 
Indeed, the Board's repeated observation that the fiill scope of all of the claims appears to be "enabled" cannot be 
reconciled with the Board's objection that only a "general plan" to combine unidentified DNA is presented. See In re 
Wands, 858 E2d 731, 736-37 (Fed, Cir 1988) (experimentation to practice invention must not be "undue" for invention 
to be considered enabled). 

The PTO points out that for biochemical processes relating to gene modification, protein expression, and immune 
response, success is not assured. However, generic inventions are not thereby invalid. Precedent distinguishes among 
generic inventions that are adequately supported, those that are merely a "wish" or "plan," the words of 
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[*30] Fiers v. Revei 984 Eld at 1171, and those in between, as illustrated by Noelle v. Lederman, 355 F3d at 1350; the 
facts of the specific case must be evaluated. The Board did not discuss the generic concept that both Capon and Eshhar 
described — the concept of selecting and combining a gene sequence encoding the variable domain of an antibody and 
a sequence encoding a lymphocyte activation protein, into a single DNA sequence which, upon expression, allows for 
immune responses that do not occur in nature. The record does not show this concept to be in the prior art, and includes 
experimental verification as well as potential variability in the concept. 

Whether the inventors demonstrated sufficient generality to support the scope of some or all of their claims, must be 
determined claim by claim. The Board did not discuss the evidence with respect to the generality of the invention and 
the significance of the specific examples, instead simply rejecting all the claims for lack of a complete chimeric DNA 
sequence. As we have discussed, that reasoning is inapt for this case. The Board's position that the patents at issue were 
merely an "invitation to 



2005 U.S. App. LEXIS 16865, *31 



[*3 1] experiment" did not distinguish among the parties' broad and narrow claims, and further concerns enablement more 
than written description. SeeAdang v. Fischhoffi 286 F.Sd 1346, 1355 (Fed. Cir 2002) (enablement involves assessment 
of whether one of skill in the art could make and use the invention without undue experimentation); In re Wright, 999 
F2d 1557, 1561 (Fed, Cir 1993) (same). Although the legal criteria of enablement and written description are related and 
are often met by the same disclosure, they serve discrete legal requirements. 

The predictability or unpredictability of the science is relevant to deciding how much experimental support is required 
to adequately describe the scope of an invention. Our predecessor court summarized in In re Storrs, 44 C.C.P.A. 981, 245 
F2d 474, 478, 1957 Dec, Comm'r Pal 361 (CCPA 1957) that "it must be borne in mind that, while it is necessary that an 
applicant for a patent give to the public a complete and adequate disclosure in return for the patent grant, the certainty 
required of the disclosure is not greater than that which is reasonable, having due regard to the subject matter involved." 
This aspect may 
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[*32] warrant exploration on remand. 

In summary, the Board erred in ruling that §112 imposes a per se rule requiring recitation in the specification of 
the nucleotide sequence of claimed DNA, when that sequence is already known in the field. However, the Board did not 
explore the support for each of the claims of both parties, in view of the specific examples and general teachings in the 
specifications and the known science, with application of precedent guiding review of the scope of claims. 

We remand for appropriate further proceedings. 

VACATED AND REMANDED 
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The fly Drosophila melanogaster is one of the most intensively studied 
organisms in biology and serves as a model system for the investigation of 
many developmental and cellular processes common to higher eukaryotes, 
including humans. We have determined the nucleotide sequence of nearly 
all of the -120-megabase euchromatic portion of the Drosophila genome 
using a whole-genome shotgun sequencing strategy supported by exten- 
sive clone-based sequence and a high-quality bacterial artificial chromo- 
some physical map. Efforts are under way to close the remaining gaps; 
however, the sequence is of sufficient accuracy and contiguity to be 
declared substantially complete and to support an initial analysis of 
genome structure and preliminary gene annotation and interpretation. The 
genome encodes —13,600 genes, somewhat fewer than the smaller Cae- 
norhabditis e/egans genome, but with comparable functional diversity. 



Tiie annotated genome sequence of Drosoph- 
ila melanogaster, together with its associated 
biology, will provide the foundation for a 
new era of sophisticated functional studies 
(1-3). Because of its historical importance, 
large research community, and powerful re- 
search tools, as well as its modest genome 
size, Drosophila was chosen as a test system 
to explore the applicability of whole-genome 
shotgun (WGS) sequencing for large and 
complex eukaryotic genomes (4), The 
groundwork for this project was laid over 
many years by the fly research community, 



which has molecularly characterized —2500 
genes; this work in mrn has been supported 
by nearly a century of genetics (5). Since 
Drosophila was chosen in 1990 as one of the 
model organisms to be studied under the 
auspices of the federally funded Human Ge- 
nome Project, genome projects in the United 
States, Europe, and Canada have produced a 
battery of genome-wide resources (Table 1 ), 
The Berkeley and European Drosophila Ge- 
nome Projects (BDGP and EDGP) initiated 
genomic sequencing (Tables 1 to 3) and fin- 
ished 29 Mb. The bacterial artificial chromo- 



some (BAG) map and other genomic resourc- 
es available for Drosophila serve both as an 
independent confirmation of the assembly 
of data fi-om the shotgun strategy and as a 
set of resources for further biological anal- 
ysis of the genome. 

The Drosophila genome is 1 80 Mb in 
size, a third of which is centric heterochro- 
matin (Fig. 1). The 120 Mb of euchromatin is 
on two large autosomes and the X chromo- 
some; the small fourth chromosome contains 
only ---1 Mb of euchromatin. The heterochro- 
matin consists mainly of short, simple se- 
quence elements repeated for many mega- 
bases, occasionally interrupted by inserted 
transposable elements, and tandem arrays of 
ribosomal RNA genes. It is known that 
there are small islands of unique sequence 
embedded within heterochromatin — for ex- 
ample, the mitogen-activated protein kinase 
gene rolled on chromosome 2, which is 
flanked on each side by at least 3 Mb of 
heterochromatin. Unlike the C. elegans ge- 
nome, which can be completely cloned in 
yeast artificial chromosomes (YACs), the 
simple sequence repeats are not stable in 
YACs {6) or other large-insert cloning sys- 
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terns. This has led to a functional definition 
of the euchromatic genome as that portion 
of the genome that can be cloned stably in 
BACs. The euchromatic portion of the ge- 
nome is the subject of both the federally 
funded Drosophila sequencing project and 
the work presented here. We began WGS 
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sequencing of Drosophila less than 1 year 
ago, with two major goals: (i) to test the 
strategy on a large and connplex eukaryotic 
genome as a prelude to sequencing the 
human genome, and (ii) to provide a com- 
plete, high-quality genomic sequence to the 
Drosophila research community so as to 
advance research in this important model 
organism. 

WGS sequencing is an effective and effi- 
cient way to sequence the genomes of pro- 
karyotes, which are generally between 0.5 
and 6 Mb in size (7). In this strategy, all the 
DNA of an organism is sheared into segments 
a few thousand base pairs (bp) in length and 
cloned directly into a plasmid vector suitable 
for DNA sequencing. Sufficient DNA se- 
quencing is performed so that each base pair 
is covered numerous times, in fragments of 
-500 bp. After sequencing, the fragments are 
assembled in overlapping segments to recon- 
struct the complete genome sequence. 

In addition to their much larger size, 
eukaryotic genomes often contain substan- 
tial amounts of repetitive sequence that 
have the potential to interfere with correct 
sequence assembly. Weber and Myers {S) 
presented a theoretical analysis of WGS 
sequencing in which they examined the 
impact of repetitive sequences, discussed 
experimental strategies to mitigate their ef- 
fect on sequence assembly, and suggested 
that the WGS method could be applied 
effectively to large eukaryotic genomes. A 
key component of the strategy is obtaining 
sequence data from each end of the cloned 
DNA inserts; the juxtaposition of these 
end-sequences ("mate pairs'') is a critical 
element in producing a correct assembly. 

Genomic Structure 

WGS libraries were prepared with three differ- 
ent insert sizes of cloned DNA: 2 kb, 10 kb, and 
130 kb. The 10-kb clones are large enough to 
span the most common repetitive sequence el- 
ements in Drosophila, the retrotransposons. 
End-sequence from the BACs provided long- 
range linking information that was used to con- 
firm the overall stmcture of the assembly (P). 
More than 3 million sequence reads were ob- 



tained from whole-genome libraries (Fig. 2 and 
Table 2). Only —2% of the sequence reads 
contained heterochromatic simple sequence re- 
peats, indicating that the heterochromatic DNA 
is not stably cloned in the small-insert vectors 
used for the WGS libraries. A BAC-based 
physical map spanning >95% of the euchro- 
matic portion of the genome was constructed by 
screening a BAG library with sequence-tagged 
site (STS) markers {10). More than 29 Mb of 
high-quality finished sequence has been com- 
pleted from BAG, PI, and cosmid clones, and 
draf^ sequence data L5X average coverage) 
were obtained from an additional 825 BAG and 
PI clones spanning in total >90% of the ge- 
nome (Table 3). The clone-based draft se- 
quence served two purposes: It improved the 
likelihood of accurate assembly, and it allowed 
the identification of templates and primers for 
filling gaps that remain after assembly. An ini- 
tial assembly was perfonned using the WGS 
data and BAG end-sequence [WGS-only as- 
sembly (4)]; subsequent assemblies included 
the clone-based draft sequence data (joint as- 
sembly). Figure 3 and Table 3 illustrate the 
status of the euchromatic sequence resulting 
from each of these assemblies and the current 
status following the directed gap closure com- 
pleted to date. The sequence assembly process 
is described in detail in an accompanying paper 
(//). 

Assembly resulted in a set of "scaffolds." 
Each scaffold is a set of contiguous sequences 
(contigs), ordered and oriented with respect to 
one another by mate-pairs such that the gaps 
between adjacent contigs are of known size and 
are sparuied by clones with end-sequences 
flanking the gap. Gaps within scaffolds are 
called sequence gaps; gaps between scaffolds 
are called "physical gaps" because there are no 
clones identified spanning the gap. Two meth- 
ods were used to map the scaffolds to chromo- 
somes: (i) cross-referencing between STS 
markers present in the assembled sequence and 
the BAC-based STS content map, and (ii) 
cross-referencing between assembled sequence 
and shotgim sequence data obtained from indi- 
vidual tiling-path clones selected fi-om the BAG 
physical map. The mapped scaffolds from the 
joint assembly, totaling 116.2 Mb after initial 
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Fig. 1. Mitotic chromosomes of D. rr^elanogaster, showing euchromatic regions, heterochromatic 
regions, and centromeres. Arms of the autosomes are designated 2U 2R, 3L 3R. and 4. The 
euchromatic length in megabases is derived from the sequence analysis. The heterochromatic 
lengths are estimated from direct measurements of mitotic chromosome lengths 167). me 
heterochromatic block of the X chromosome is polymorphic among stocks and varies from 
one-third to one-half of the length of the mitotic chromosome. The Y chromosome is nearly 
entirely heterochromatic. 
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gap closure, were deposited in GenBank (ac- 
cession numbers AE002566-AE003403) and 
form the basis for the analysis described in this 
article. 

The WGS-only assembly resulted in 50 
scaffolds spanning 1 14.8 Mb that could be 
placed unambiguously onto chromosomes 
solely on the basis of their STS content (la- 
beled ''D" in Fig. 3). The joint assembly 
included clone-based sequence, but no spe- 
cific advantage was taken of the location infor- 
mation of each clone-based read by the whole- 
genome assembly algorithm. Nonetheless, the 
clone-based sequence from BACs in the phys- 
ical map allowed placement of an additional 84 
small scaffolds (1.4 Mb) on chromosome arms 
in the joint assembly (labeled ''C" in Fig. 3). As 
shown in Fig. 3, a few large scaffolds in each 
assembly span a large portion of each cliromo- 
some arm, with a number of additional smaller 
scaffolds located at the centromeric end, except 
on the right arm of chromosome 3. Nearly all of 
the scaffolds added to chromosomes in the joint 
assembly, relative to the WGS-only assembly, 
are adjacent to the centric heterochromarin, 
which demonstrates the utility of the physical 
map in these regions. The density of transpos- 
able elements (labeled "A" in Fig. 3) increases 
markedly in the transition zone between 
euchromatin and heterochromatin, as dis- 
cussed below. An additional 704 scaffolds 
in the joint assembly, equivalent to 3.8 Mb, 
could not be placed with accuracy on the 
genome. Most of these do not match clone- 
based sequence from the physical map, and 
therefore they most likely represent small 
islands of unique sequence embedded with- 
in regions of heterochromatin. Because of the 
instability of the surrounding genomic regions, 
these sequences would not have been obtained 
through a sequencing approach that was depen- 



dent on cloning in large-insert vectors. 

Among the 134 mapped scaffolds, there 
were 1636 contigs after assembly (hence 1630 
gaps, considering that there are six linear cliro- 
mosome ami segments to be assembled). On 
the major autosomes, there are five physical 
gaps in the BAG map, three of which are near 
a centromere or telomere ( JO). Because the 
WGS approach did not span these gaps, they 
likely contain unclonable regions. Most gaps on 
the autosomes — including gaps between scaf- 
folds — were therefore cloned in either WGS 
clones or BAG subclones used for clone-based 
draft sequencing and are considered sequence 
gaps. Directed gap closure was done through 
use of several resources, including whole BAG 
clones, plasmid subclones, and Ml 3 subclones 
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fi*om the Lawrence Berkeley National Labora- 
tory (LBNL) and Baylor College of Medicine 
centers' draft sequence of BAG and PI clones; 
10-kb subclones from the whole-genome librar- 
ies; and polymerase chain reaction (PGR) from 
genomic DNA (/2). The average size of the 
gaps filled to date is 771 bp (their predicted size 
was 757 bp); the predicted size of the remaining 
gaps is 21 20 bp. Table 3 provides details of the 
status of each chromosome arm as of 3 March 
2000. 

The accuracy of the assembly was measured 
in several ways, as described ( / /). In summary, 
the scaffold sequences agree very well with the 
BAG-based STS content map and with high- 
quality finished sequence. In the 7 Mb of the 
genome where very high-quality sequence was 
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Fig, 2. Accuracy of sequence reads from ABl Prism 3700 DNA analyzer. A database of BAC and PI 
clone sequences from BDGP finished to high accuracy (P,„^ > 100.000, indicating less than one 
error predirted per 100,000 bases) was constructed Trimmed WGS sequence reads matching these 
BAC and PI clones were identified by BLAST, The first high-scoring pair (HSP) with a full-length 
match was used. Identity is the percentage of matched nucleotides in the alignment; 49.756 
sequence reads from 2-kb libraries and 23.455 reads from 10-l^b libraries matched these BAC and 
PI sequences. The average trimmed read length of sequences from 2-kb and 10-kb clones was 570 
bp and 567 bp, respectively. 



Table 1. Genomic resources for Drosophila. 



Type 



Description 



Resolution 



Contribution 



Source and reference 



BAC-based STS 
content map 



Polytene map 



BAC 

end-sequence 
Finished 

clone-based 

sequence 
Draft sequence 

from mapped 

BACs 



STS content map constructed 
by saeening ^Z3X 
genome coverage of BAC 
clones; a tiling path of 
BACs spanning each 
chromosome arm was 
selected 

Tiling-path BACs hybridized 
to polytene chromosomes 

- 500 bp of sequence from 
each end of a BAC clone 

BAC, PI, and cosmid clones 
completely sequenced to 
high accuracy 

s:l.5x shotgun sequence 
coverage of 825 clones 
from the tiling path of 
BAC and PI clones 



50 kb 



30 kb 



Two reads per 
-130 kb 

-29 Mb of 
total 
sequence 

384 reads 
distributed 
across 
-160 kb 



Location of whole-genome 
scaffolds to 
chromosomes; 
confirmation of 
accuracy of assembly 



Location of STSs and BACs 

to chromosomes; 

validation of BAC map 
Long-range association of 

sequence contigs 
Assessment of accuracy of 

Celera sequence and 

assembly 
Location of sequence 

contigs to a small 

genomic region; 

templates for gap 

closure 



BDGP [chromosomes 2 and 3 
(70)1, EDGP [X 
chromosome (69), 
www.dundee.ac.uk/ 
anatphys/robert/Xdivs/ 
Maplntro.htm], University 
of Alberta [chromosome 4 
(70)] 

See [10] 
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available for comparison, the accuracy of the 
assembled sequence was 99.99% in nonrepeti- 
tive regions. In the --2.5Vo of the region com- 
prising the most highly repetitive sequences, the 
accuracy was 99.5%. 

Heterochromatin-euchroniatin transi- 
tion zone. The genomes of eukaryotes gen- 
erally contain heterochromatic regions sur- 
rounding the centromeres that are intractable 
to all current sequencing methods. In Dro- 
sophila, -60 Mb of the 180-Mb genome 
consists of centric heterochromatin, which is 
composed primarily of simple sequence sat- 
ellites, transposons, and two large blocks of 
ribosomal RN A genes (13). We examined the 
sequence organization at boundaries between 
euchromatin and centric heterochromatin in 
two regions, one in division 20 on the X 
chromosome and the other in division 40 on the 
left arm of chromosome 2. On the X chromo- 
some, gene density in division 20 drops abnipt- 
ly — to two genes in 400 kb around folded gas- 
U-ulation — and then rises to 1 1 genes in 1 30 kb. 
Next, at least 10 Mb of largely satellite DNA 
sequences and the ribosomal RNA gene cluster 
are located just distal to the centromere itself. 
On the left arm of chromosome 2, a similar 
situation exists: There is a nomial gene density 
in division 39, followed by only two genes in 
350 kb near teashirt in division 40, then by a 
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200-kb region containing 10 genes. These tran- 
sition zones between euchromatin and hetero- 
chromatin contain many previously unknowTi 
genes, including counterparts to human cyclin 
K and mouse Krox-4. None of the 1 1 genes 
proximal to teashirt and only one of the 10 
genes proximal to folded gastrulation was 
known previously. 

What is the nature of the sequence in the 
gene-poor regions? The most common se- 
quences by far were transposons, consistent 
with previous small-scale analyses {14). 
These include several new elements similar 
to transposons in other species, as well as the 
-50 transposon classes previously character- 
ized in Drosophila. Some short runs of satel- 
lite sequences are present, but it has not been 
determined whether they might have been 
truncated during cloning. In addition, at least 
110 other simple repeat classes were identi- 
fied, some of which are distributed widely 
outside of heterochromatin. 

Criteria for describing the completion 
status of a eukaryotic genome. Because of 
the unclonable repetitive DNA surrounding the 
centromeres, it is highly unlikely that the 
genomic sequence of chromosomes ft*om eu- 
karyotes such as Drosophila or hiunan will ever 
be "complete." It is therefore necessary to pro- 
vide an assessment of the contiguity and accu- 



Table 2. Source of data for assembly: Whole-genome shotgun sequencing. See (65) for more information 
about library construction and sequencing. 



Vector 


Insert size 
(kbp) 


Paired 
sequences 


Total 
sequences 


Clone 
coverage 


Sequence 
coverage 


High-copy plasmid 
Low-copy plasmid 
BAC 
Total 


2 
10 
130 


732.380 
548.974 
9,869 
1,290,823 


1,903.468 
1,278.386 
19,738 
3,201,592 


11.2X 
42.2 X 
11. 4X 
64,8X 


7.3X 
5.4X 
0.07 X 
12.8X 



racy of the sequence. Table 4 lists several ob- 
jective parameters by which the status can be 
judged and by which improvements in future 
releases can be measured. We have termed the 
version of the sequence associated with this 
publication "Release 1" and intend to make 
regular future releases as gaps are filled and 
overall sequence accuracy is increased. 

One measure of the completeness of the 
assembled sequence is the extent to which 
previously described genes can be found. An 
analysis of the 2783 Drosophila genes with 
some sequence information that have been 
compiled by FlyBase (/5) resulted in identi- 
fication of 2778 on the scaffold sequence. All 
of the remainder are found in unscaffolded 
sequence. The remaining six were all cloned 
by degenerate PGR, and it is possible that 
some or all of these genes are incorrectly 
ascribed to Drosophila (16). Of the base pairs 
represented in the 2778 genes, 97.5% are 
present in the assembled sequence. 

Annotation 

The initial annotation of the assembled genome 
concentrated on tw^o tasks: prediction of tran- 
script and protein sequence, and prediction of 
llinction for each predicted protein. Computa- 
tional approaches can aid each task, but biolo- 
gists with expertise in particular fields are re- 
quired for the results to have the most consis- 
tency, reliability, and utility. Because the 
breadth of expertise necessary to annotate a 
complete genome does not exist in any single 
individual or organization, we hosted an "An- 
notation Jamboree'' involving more than 40 
scientists from around the worid, primarily 
from the Drosophila research community. Each 
was responsible for organizing and interpreting 
the gene set for a given protein family or bio- 
logical process. Over a 2-week period, jambo 



Table 3. BAC and PI clone-based sequencing. EDCP. European Drosophila Genome Project: BCM, Baylor College of Medicine; LBNL. Lawrence Berkeley National 
Laboratory (BCM and LBNL are the genomic sequencing centers of the BDGP). 



Clone-based genomic sequencing 



Chromosomal 
region 



Croup 



Size 



Finished 
sequence 
(Mb) 



Draft sequence in joint 
assembly [BACs, {P1s)lt 



Clones 



Average 
coverage 



Total 
sequenced 

BACs 
(Pis) in 

joint 
assembly 



Additional 
sequenced 

BACs in 
tiling path 



Cap closure: current status 

Percentage of DNA sequence 
in contigs greater than 



30 kb 



100 kb 1 Mb 



X (1-3) 


EDGP 


3 


2,5 


0 




X(4-11) 


BCM 


8.8 


0.1* 


0 


2.3 X 


X (12-20) 


LBNL 


10 


0 


71 


2L 


LBNL 


23 


14.0 


103 (8) 


1.6X (5.3X) 


2R 


LBNL 


21.4 


8.8 


159 (32) 


1.3X (4.7X) 


3L 


BCM 


24.4 


0,1 


166 


1.3X 


3L 


LBNL 


2.1 


22(7) 


1.7X (2.5X) 


3R 


LBNL 


28 


2.1 


259 (9) 


1,2X (2X) 


4 


LBNL 


1.2 


0 


16 


1.4X 


Total 




120 


29.7 


796 (56) 





0 

1 

71 

119(202) 
157(186) 
170 

20 (32) 
264(27) 

15 

817(447) 



0 


79.4 


32.7 


0 


72 








10 








2 


97.8 


91.4 


16.9 


0 


96.4 


90.6 


32.8 


50 


95.1 


77.7 


0 


0 








0 


98.5 


92.6 


3.6 


1 


85.6 


43.5 


0 


135 


93.7 


77.5 


9.9 



♦Sequenced at LBNL fA tiling path of clones spanning 97% of the euchromatic portion of the genome was selected from the genome P^-V^'"' -r^,"" ' 
sequencing. The data include sequence that has been genen,ted since the beginning of the publicly funded (BDGP and EDCP) genome sequencing P^.^^ P/'^^^^^^^^ 
were verified by screening the shotgun sequence for expected STS and BAC end-sequences, sequenced genes with known map locations from ge"«. (and regions flanking P insertions), 
and sequences of neighboring tiling path clones. The average size of BAC clones in the tiling path is 163 kb. Sequencing methods are descnbed in (66). 



2188 



24 MARCH 2000 VOL 287 SCIENCE www.sciencemag.org 



THE DR050PHILA GENOME 



A 111! 
B 



liii 111 iiiir aiim m. ii ii iiiii ii i ii m i:: in ii ii iiiii; H;-:r;,J^^^^^^ imhii ii n oi iiii p 



it -:-. IIHI ' 

: i - I I. 



0Mb 



I I I 

5 . 



■1. I 

10 



A 1 1 1! n !!im. i: I H i. ill I I I! :i ,t!i III . - 1 iiifrii i ^Hi m^^^^^^^^^ mi mi it ii 1.1111 iiiinmiiiB 

■ - •' ■ - ■ ■ ■■■■■■■ ' ■ , ' — _- — • ■ :■ — I ni l ■ 



D K 



I I I 

15 



H \ 

1 1 I 



■:0Mb. 5 10 ^ ^ 



A MBiHii 111 III .-III III iiii Hi I - 1, 1 11 rii 1 ip: II III .1 : i t: .n:ii|: ::i>i^i>' J : jV i|' 1' ^ ^ . 







1 — -r— T r'""'"i ■ 1 

0Mb 5 


— r 1 1 I I I 1 . f ^ - x 


15 - ' . ^ 20 









/V iif iiii i;; I I liiti: ■ I H I nil til i -iii I ! lUM II r iiiiuii;; ^ iiM^^ 



D h 



I I I 

QMb 



i10' 



■■■■■ 20 



A jiiiDllili; III I ! Bliinil III I I II II I II Hill, ir II' 1;. 



iDBiiif - 1^ iiii^i I II HI' IIIII ■ II II II iiiBiiliit'll" " 11.^^ iiiiriiiii. 



I I 



I I 



I I 



. r. ,..".=1 . .■■^v,' ■ : > .- 



Fig. 3. Assembly status of the Drosophiia genome. Each chromosome arm is 
depicted with information on content and assembly status: (A) transposable 
elements, (B) gene density, (C) scaffolds from the joint assembly, (D) 
scaffolds from the WCS-only assembly, (E) polytene chromosome divisions, 
and (F) clone-based tiling path. Gene density is plotted in 50-kb windows; 
the scale is from 0 to 30 genes per 50 kb. Gaps between scaffolds are 



represented by vertical bars in (C) and (D). Clones colored red in the tiling 
path have been completely sequenced; clones colored blue have been 
draft-sequenced. Gaps shown in the tiling path do not necessarily mean that 
a clone does not exist at that position, only that it has not been sequenced. 
Each chromosome arm is oriented left to right, such that the centromere is 
located at the right side of X, 2L, and 3L and the left side of 2R and 3R. 
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ree participants worked to define genes, to 
classify them according to predicted function, 
and to begin synthesizing infomiation from a 
genome- wide perspective. 

For definition of gene structure, w^e relied on 
the use of different gene-finding approaches: 
the gene-finding programs Genscan (/ 7) and a 
version of Genie that uses expressed sequence 
tag (EST) data (J 8), plus the results of comple- 
mentary DNA (cDNA) and protein database 
searches, followed by review by human anno- 
tators (J 9). Genscan predicted 17,464 genes, 
and Genie predicted 13,189. We believe that 
the lower estimate is more accurate, because in 
a test that used the extensively studied and 
annotated 2.9-Mb Adh region (3), the Genie 
predictions were closer to the number of exper- 
imentally determined genes; Genscan predicted 
far too many (20). This is likely because Genie 
was optimized for Drosophila, whereas Gen- 
scan parameters suitable for Drosophila gene- 
fmding are not available. 

Results of the computational analyses were 
presented to annotators by means of a custom 
visualization tool that allowed annotators to de- 
fine transcripts on the basis of EST (27) and 
protein sequence similarity infomiation. Genie 
predictions, and Genscan predictions, in de- 
creasing order of confidence. Tlie present anno- 
tation of the Drosophila genome predicts 1 3,60 1 
genes, encoding 14,113 transcripts through al- 
ternative splicing in some genes. The number of 
alternative splice forms that can be annotated is 
limited by the available cDNA data and is a 
substantial underestimate of the total number of 
alternatively spliced genes. More than 10,000 
genes with database matches were reviewed 
manually- The remaining -3000 genes were 
predicted by Genie but have no database match- 
es that can be used to refine intron-exon bound- 
aries. Genes predicted by Genscan that did not 
overlap Genie predictions or database matches 
were not included in the set of predicted pro- 
teins. Table 5 summarizes the evidence for these 
genes: 38% of the Genie predictions are sup- 
ported by evidence from both EST and protein 
matches, 27% by ESTs alone, and 12% by 
protein matches alone. Altogether there are EST 
matches for 65% of the genes, but nearly half of 
the total ESTs match on ly 5% of the genes; 23% 
of the predicted proteins do not match sequences 
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from other organisms or Drosophila ESTs. This 
set of annotations is considered provisional and 
will improve as additional ftill-length cDNA 
sequence and fiinctional infonnation becomes 
available for each gene. Figure 4 provides a 
graphical overview of the gene content of the 
fiy. 

Genes were classified according to a ftinc- 
tional classification scheme called Gene Ontol- 
ogy (GO). The GO project (22) is a collabora- 
tion among FlyBase, the Saccharomyces Ge- 
nome Database (23), and Mouse Genome In- 
fonnatics (24). It consists of a set of controlled 
vocabularies providing a consistent description 
of gene products in terms of their molecular 
ftinction, biological role, and cellular location. 
At the time of our annotation, proteins encoded 
by 1539 Drosophila genes had already been 
annotated by FlyBase using -1200 different 
GO classifications. In addition, a set of 718 
proteins from 5. cerevisiae and 1724 proteins 
from mouse had been annotated and placed into 
GO categories. Predicted Drosophila genes and 
gene products were used as queries against a 
database made up of the sequences of these 
three sets of proteins (by BLASTX or 
BLASTP) (25) and grouped on the basis of the 
GO classification of the proteins matched. 
About 7400 transcripts have been assigned to 
39 major functional categories, and about 4500 
have been assigned to 47 major process cate- 
gories (Table 6). 

The largest predicted protein is Kakapo, a 
cytoskeletal linker protein required for adhesion 
between and within cell layers, with 5201 amino 
acids; the smallest is the 2 1 -amino acid ribo- 
somal protein L38. There are 56,673 predicted 
exons, an average of tour per gene, occupying 
24.1 Mb of the 120-Mb euchromatic sequence 
total. The size of the average predicted transcript 
is 3058 bp. There was a systematic underpredic- 
tion of 5' and 3' untranslated sequence as a 
result of less than complete EST coverage and 
the inability of gene-prediction programs to pre- 
dict the noncoding regions of transcripts, so the 
number of exons and introns iind the average 
transcription unit size are certain to be underes- 
fimates. There are at least 41,000 introns, occu- 
pying 20 Mb of sequence. Intron sizes in Dro- 
sophila are heterogeneous, ranging fi-om 40 bp 
to more than 70 kb, with a clear peak between 



Table 4. Measures of completion. Analyses supporting many of these values are found in (77). 

134 
704 
116.2 Mb 
3.8 Mb 
64 kb 
98.2% 
95.5% 
68.0% 
1299 
99.99% 
99.7% 



Number of scaffolds mapped to chromosome arms 
Number of scaffolds not mapped to chromosomes 
Number of base pairs in scaffolds mapped to chromosome arms 
Number of base pairs in scaffolds not mapped to chromosome arms 
Largest unmapped scaffold 

Percentage of total base pairs in mapped scaffolds >100 kb 
Percentage of total base pairs in mapped scaffolds >1 Mb 
Percentage of total base pairs in mapped scaffolds >10 Mb 
Number of gaps remaining among mapped scaffolds 
Base pair accuracy against LBNL BACs (nonrepetitive sequence) 
Known genes accounted for in scaffold set 



59 and 63 bp {26). The average number of 
exons is four, although this is an underestimate 
because of a systematic underprediction of 5' 
and 3' untranslated exons. We identified 292 
transfer RNA genes and 26 genes for spliceoso- 
mal small nuclear RNAs (snRNAs). We did not 
attempt to predict other noncoding RNAs. 

The total number of protein-coding genes, 
13,601, is less than that predicted for the womi 
C elegans (27) (18,425; WormPep 18, 1 1 Oc- 
tober 1999) and far less than the -27,000 esti- 
mated for the plant Arabidopsis thaliana (28). 
The average gene density in Drosophila is one 
gene per 9 kb. There is substantial variation in 
gene density, ranging fi-om 0 to nearly 30 genes 
per 50 kb, but the gene-rich regions are not 
clustered as they are in C elegans. Regions of 
high gene density correlate with G+C-rich se- 
quences. In the ~1 Mb adjacent to the centric 
heterochromatin, both G+C content and gene 
density decrease, although there is not a marked 
decrease in EST coverage as has been seen in A, 
thaliana (28). 

Genomic Content 

The genomic sequence has shed light on some 
of the processes common to all cells, such as 
replication, chromosome segregation, and iron 
metabolism. There are also new findings about 
important classes of chromosomal proteins that 
allow insights into gene regulation and the cell 
cycle. Overall, the correspondence oWrosoph- 
ila proteins involved in gene expression and 
metabolism to their human counterparts reaf- 
finns that the fly represents a suitable experi- 
mental platform for the examinafion of human 
disease networks involved in replication, repair, 
translation, and the metabolism of drugs and 
toxins. In an accompanying manuscript (29\ 
the protein complement of Drosophila is com- 
pared to those of the two eukaryotes with com- 
plete genome sequences, C elegans and 5. 
cerevisiae, and other developmental and cell 
biological processes are discussed. 

Replication. Genes encoding the basic 
DNA replication machineiy are conserved 
among eukaryotes {30)\ in particular, all of the 
proteins known to be involved in start site 
recognition are encoded by single-copy genes 
in the fly. These include members of the six- 
subunit heteromeric origin recognition complex 
(ORG) (31), the MCM helicase complex (32), 
and the regulatory factors CDC6 and CDC45, 
which are thought to determine processing of 
pre-initiation complexes. The fly 0RC3 and 
ORC6 proteins, for example, share dose se- 
quence similarity with vertebrate proteins, but 
not only are they highly divergent relative to 
yeast ORCs, they have no obvious counterparts 
in the wonn. It is striking that the ORG genes 
exist as single copies, given the orthologous 
functions for some of the subunits in other 
processes (33). It had been considered possible 
that a large family of ORCs, each with a dif- 
ferent binding specificity, might account for 
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different origin usage in development. Clearly, 
given the single-copy ORC genes, other as-yet- 
undiscovered cis-acting elements and trans-act- 
ing factors participate in developmentally reg- 
ulated processes such as switches in origin us- 
age, gene amplification, and specialized repli- 
cation of euchroniatin in certain endocycles. In 
contrast, the fly has two distinct homologs of 
the proliferating cell nuclear antigen (PCNA), 
the processivity factor for the DNA poly- 
merases (5 and e) involved in chain elongation. 
Human PCNA is blocked from interaction with 
the replication enzymes by the checkpoint reg- 
ulator p21 in response to DNA damage {34); 
peiiiaps one of the fly PCNA proteins is im- 
mune to such regulation and is thus left active 
for repair or replication. 

Chromosomal proteins. Analysis of pro- 
tein families involved in chromosome inheri- 
tance reveals both expected fmdings and some 
surprises. As expected, the fly has all four 
members of the conserved SMC family in- 
volved in sister chromatid cohesion, condensa- 
tion, DNA repair, and dosage compensation 
(35). The fly also contains at least one ortholog 
of each of the M AD/Bub metaphase-anaphase 
checkpoint proteins that are conserved from 
yeast to mammals. However, Drosophila does 
not appear to have orthologs to most of the 
proteins identified previously in manmials or 
yeast that are associated with centromeric 
DNA, such as the CENP-C/MiF-2 family and 
the yeast CBF3 complex (36). One exception is 
the presence of a histone H3-like protein that 
shares sequence similarity with mammalian 
CENP-A, a centromere-specific H3-like pro- 
tein. There are at least nine histone acetyl trans- 
ferases (HATs) and five histone deacetylases 
(HDACs), which are involved in regulating 
chromatin structure {37)\ only three of each 
have been reported previously. There are also 
17 members of the SNF2 adenosine triphos- 
phatase (ATPase) family, which represent 9 
of the 10 known subfamilies. Many of these 
ATPases are involved in chromatin remodeling 
(38). The fly also contains at least 14 proteins 
with chromodomains (39), six of which are 
new, including two HPl -related proteins. Al- 
though many of these chromodomain-contain- 
ing proteins have orthologs in vertebrates, only 
one (CHDl) appears in yeast, flies, and verte- 
brates. There are also at least 13 bromodomain- 
containing proteins, seven of which are new; 
the bromodomain may interact with the acety- 
lated NH^-terminus of histones and is involved 
in chromatin remodeling and gene silencing 
(40). Only three of these appear to have coun- 
terparts in yeast. Furthermore, Drosophila telo- 
meres lack the simple repeats that are charac- 
teristic of most eukaryotic telomeres (41), and 
the known telomerase components of verte- 
brates, for example, are absent from flies. The 
fly does, however, contain five proteins that are 
close relatives of the yeast and human S1R2 
telomere silencing proteins. 
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DNA repair. The importance of DNA re- 
pair in maintaining genomic integrity is reflect- 
ed in the conservation of most proteins impli- 
cated in the major defined pathways of eukary- 
otic DNA repair. However, there are some no- 
table absences. For example, no convincing 
homologs can be found for the genes encoding 
the RAD7, RAD 1 6, RAD26 (CSB/ERCC6), 
and RAD28 (CSA) proteins, which are impli- 
cated in stnand-specific modes of repair in yeast 
and/or mammalian systems. In base excision 
repair processes, 3-methyladenine glycosylase 
and uracil-DNA-glycosylase are absent, al- 
though the latter function is likely fulfilled by 
the G/T mismatch-specific thymine DNA gly- 
cosylase (42). In the damage bypass pathway, 
sequences encoding homologs of DNA poly- 
merase t, (yeast Re\'3pf Drosophila mus205) 
and Revlp are present, although a REV? ho- 
molog is not found. As in humans and worms, 
two members of the RAD30 (polymerase n) 
gene family are present. In the mismatch repair 
system, only two proteins related to Escherich- 
ia coli mutS are predicted, rather than the usual 
family of five or more members. The previous- 
ly reported Msh2p homolog (43) is present, as 
is a sequence most closely resembling Msh6p. 
Budding yeast and humans possess additional 
members of the mutS gene family that are 
proposed to function in partially redundant 
pathways of mismatch repair (MSH3) and in 
meiotic recombination (MSH4 and MSH5), 
suggesting either that the Drosophila mutS ho- 
mologs have reduced specificity or that alterna- 
tive proteins are fulfilling these roles in the fly. 
In the recombinational repair pathway, two ad- 
ditional members of die recA/RAD51 gene 
family are identified, bringing the total to four. 
However, no member of the RAD52/RAD59 
family is present. One additional member of the 
recQ/SGSl heticase family was identified, in 
addition to the two already noted (44); the new 
protein is most similar to human RecQ4. Final- 
ly, with respect to nonhomologous end joining, 
Drosophila joins the list of invertebrate species 
that lack an apparent DNA-PK catalytic sub- 
unit, although both Ku subunits and DNA li- 
gase 4 are present. We conclude that most 
major components of the repair network in flies 
have been uncovered. If more are present, either 



they have diverged so far that they are unrec- 
ognizable by BLAST searches, or the systems 
have become degenerate (tliat is, otlier network 
components are fulfilling the same roles). 

Transcription. Gene regulation has tradi- 
tionally been singled out as one of the primary 
bases for the generation of evolutionary diver- 
sity. How has the core transcriptional machin- 
ery changed in different phyla? Drosophila 
core RNA polymerase 11 and some general 
transcription factors (TFHA-H, TPHIA, and 
TFIIIB) are similar in composition to those of 
both mammals and yeast (45). In contrast, core 
RNA polymerases I and III, TBP (TATA-bind- 
ing protein )"Containing complexes for class 1, 
class II, and snRNA genes (TBP-associated 
factors TAF, and TAF,j, and SNAP^, respec- 
tively), TFmC, and SRB/mediator vary greatly 
in composition in Drosophila and mammals 
relative to yeast (46). The RNA polymerase I 
transcription factors of flies and mammals have 
clear amino acid conservation; yeast RNA 
polyiTierase 1 factors do not appear to be related 
to them. For example, the mammalian promoter 
intenicring factors U BF and TIF- 1 A are present 
in Drosophila but not in yeast, and yeast UAF 
subunits are absent in Drosophila and apparent- 
ly absent in mammals. Furthermore, of the 
three TAF,s in the human selectivity factor 1, 
the mouse transcriptional initiation factor IB, 
and the yeast core factor complexes, only the 
human/mouse TAFi63/TAF,68 subunit is con- 
served in the fly. Similarly, Drosophila encodes 
three of the five mammalian SNAP(;. subunits 
(SNAP43, 50, and 190) for which no homologs 
exist in the yeast genome. 

In addition to the family of previously de- 
scribed TBPs {47\ the fly contains multiple 
forms of several ubiquitous TAF„s (TAFj,30p, 
TAFjj60, and TAF„80) (46). This raises the 
possibility that a variety of TFIID complexes 
evolved in metazoan organisms to regulate 
gene expression patterns associated with devel- 
opment and cellular differentiation. The con- 
stellation of factors that interact with RNA 
polymerase II in Drosophila may also contrib- 
ute to this regulation, because Drosophila con- 
tains only a small subset of yeast SRB/mediator 
subunits (MED6, MED7, and SRB7) but a vast 
majority of the molecularly characterized com- 



Table 5. Summary of the gene predictions in Drosophila. Gene prediction programs were used in 
combination with searches of protein and EST databases. 



Result 



EST + protein match 
EST match only 
Protein match only 
No match 
Total 



Genie + 


Genie 


Genscan 


No gene 


Genscan* 


onlyt 


onlyt 


prediction§ 


6,040 


288 


239 


49 


1.357 


143 


107 


34 


2.541 


157 


220 


78 


1.980 


307 


0 


0 


11.918 


895 


627 


161 



Total 



6.616 
1.641 
2.996 
2,348 
13.601 



*Cenie and Genscan matches overlapped but were not necessarily identical. tCenie predictions m regions not 
predicted by Genscan. JCenscan predictions in regions not predicted by Genie; in the absence of database matches. 
>4000 Genscan predictions were not included in the annotated gene set. §Cene structures defined based on 
database matches in the absence of gene predictions. 
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ponents of mammalian coaclivator complexes 
such as ARC/DRIPATRAP. 

Gene regulation. On the basis of similar- 
ity to known proteins, Drosophila appears to 
encode about 700 transcription factors, about 
half of which are zinc-finger proteins. By 
contrast, the wonn has about 500 transcrip- 
tion factors, fewer than one-third of which are 
zinc-finger proteins (29). Two additional 
classes play key roles in regulation: the 
homeodomain-containing and nuclear hor- 
mone receptor-type transcription factors. 

Homeodomain-containing proteins con- 
trol a wide variety of developmental pro- 
cesses. Twenty-two new homeodomain- 



THE DROSOPHILA GENOME 

containing proteins were uncovered in our 
analysis, bringing the total to more than 
100. Ten of these were members of the 
paired-box PRX superclass {48), some with 
known vertebrate homologs: short stature ho- 
meobox 2 (SHOX), cartilage homeoprotein 1 
(CART), and the two retina-specific proteins 
(VSX-1 and VSX-2) of goldfish. New mem- 
bers were also found in the LIM and TGIF 
class. The two new LFM members contain a 
homeobox and two copies of the LIM motif; the 
two new TGIF members occur as a local tan- 
dem duplication on the right arm of chromo- 
some 2. We also found single new members of 
the NK-2, muscle-specific homeobox, proline- 



rich homeodomain (PRH), and BarH classes. 
The new fly gene encoding NK-2 is a cognate 
of the gene encoding the NKX-5.1 mouse pro- 
tein. The new fly gene encoding muscle-specif- 
ic homeobox is most similar to the gene encod- 
ing the MSX-1 mouse protein involved in 
craniofacial morphogenesis. The new fly gene 
encoding PRH is most similar to a mouse gene 
expressed in myeloid cells. The remaining ho- 
meodomain-containing proteins are orphans: 
One has similarity to the human H6 protein 
involved in craniofacial development, and an- 
other to HB9, a protein required for nonnal 
development of the pancreiis. 

Nuclear hormone receptors (NRs) are 



Table 6. Gene Ontology (CO) classification of Drosophila gene products. 
Each of the 14,1 13 predicted transcripts was searched by BLAST against a 
database of proteins from fly, yeast, and mouse that had been assigned 
manually to a function and/or process category in the CO system. 
Function categories were reviewed manually, and in many cases a Dro- 
sophita protein was assigned to a different category upon careful inspec- 
tion. The number of transcripts assigned to each process category is 



the result of computational searches only. For functions, the number of 
transcripts assigned and manually reviewed in each category is shown 
(with the results of the computational search in parentheses). Certain 
cases illustrate the value of the manual inspection. For example, motor 
proteins initially included many coiled-coil domain proteins incorrectly 
assigned to this category by the computational search. Supplemental data 
are available at wwwxelera.com. 



Function 



Number of 
transcripts 



Process 



Number of 
transcripts 



3894 
2274 
53 
69 
8 

1078 
64 
57 
110 
112 
735 
69 
685 
215 
52 
273 
81 
51 
111 
6 
23 
336 
72 
109 
43 
8 

116 
50 
9 

223 
149 
417 
5 

390 
7 

211 
530 
228 
279 
486 
7 

201 
64 
54 
8884 



Nucleic acid binding 1387 (1370) 

DNA binding 919 (652) 

DNA repair protein 65 (30) 

DNA replication factor 38 (18) 

Transcription factor 694 (418) 

RNA binding 259 (205) 

Ribosomal protein 128 (116) 

Translation factor 69 (68) 

Transcription factor binding 21 (116) 

Cell cycle regulator 52 (104) 

Chaperone 159 (158) 

Motor protein 98 (373) 

Actin binding 93 (64) 

Defense/immunity protein 47 (41) 

Enzyme 2422(2021) 

Peptidase 468 (456) 

Endopeptidase 378 (387) 

Protein kinase 236 (307) 
Protein phosphatase 
Enzyme activator 
Enzyme inhibitor 
Apoptosis inhibitor 
Signal transduction 
Receptor 

Transmembrane receptor 261 (280) 

G protein-linked receptor 163 (160) 

Olfactory receptor 48 (49) 

Storage protein 12 (27) 

Cell adhesion 216 (271) 

Structural protein 303 (302) 

Cytoskeletal structural protein 106 (54) 

Transporter 665 (517) 

Ion channel 148 (188) 

Neurotransmitter transporter 33 (18) 

Ligand binding or carrier 327 (391) 

Electron transfer 124 (117) 

Cytochrome P450 88 (84) 

Ubiquitin 11 (T7) 

Tumor suppressor 10 (5) 

Function unknown/unclassified 7576 (7654) 

Conserved hypothetical (1474) 



93 
9 
68 
15 



(93) 
(19) 
(92) 
(17) 
622 (554) 
337 (336) 
261 
163 
48 
12 



Cell growth and maintenance 
Metabolism 

Carbohydrate metabolism 
Energy pathways 
Electron transport 

Nucleotide and nucleic acid metabolism 

DNA metabolism 

DNA replication 

DNA repair 

DNA packaging 
Transcription 
Amino acid and derivative metabolism 
Protein metabolism 

Protein biosynthesis 

Protein folding 

Protein modification 

Proteolysis and peptidolysis 

Protein targeting 
Lipid metabolism 

Monocarbon compound metabolism 
Coenzymes and prosthetic group metabolism 
Transport 
Ion transport 
Small molecule transport 
Mitochondrial transport 
Ion homeostasis 
Intracellular protein traffic 
Cell death 
Cell motility 
Stress response 

Defense (immune) response 
Organelle organization and biogenesis 

Mitochondrion organization and biogenesis 
Cytoskeleton organization and biogenesis 
Cytoplasm organization and biogenesis 
Cell cycle 
Cell communication 
Cell adhesion 
Signal transduction 
Developmental processes 

Sex determination 
Physiological processes 
Sensory perception 
Behavior 

Process unknown/unclassified 
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sequence-specific, ligand-dependent tran- 
scription factors that contribute to physio- 
logical homeostasis by functioning as both 
transcriptional activators and repressors. 
Examination of the fly genome revealed 
only four additional NR members, bringing 
the total to 20. tn contrast, the NR family 
represents the most abundant class of tran- 
scriptional regulators in the worm: More 
than 200 member genes have been de- 
scribed. One of the newly identified fly 
NRs possesses a new P-box element (Cys- 
Asp-Glu-Cys-Ser-Cys-Phe-Phe-Arg-Arg), 
which confers DNA binding specificity, 
bringing to 76 the number of P-boxes 
identified to date in all species. A search of 
the Drosophila genome failed to identify 
any homologs to the mammalian pi 60 gene 
family of NR coactivator proteins. SMRTER, 
despite weak similarity to the mammalian 
corepressors SMRT and N-CoR, appears to 
be the only close relative in Drosophila, 

Translation and RNA processing. Al- 
though the staicture of the ribosome has been 
well worked out, it has become apparent that 
many ribosomal proteins are multifunctional 
and are involved in processes as disparate as 
DNA repair and iron-binding (49). There has 
been an enormous genetic investigation of the 
consequences of changes in expression level 
of Drosophila ribosomal proteins (the Minute 
phenotype) {50)\ the identification and map- 
ping of the complete set presented here will 
provide the basis for in-depth dissections of 
their functions and disease roles. 

Most genes encoding general translation 
factors are present in only one copy in the 
Drosophila genome, as they are in other ge- 
nomes studied to date; however, we discov- 
ered six genes encoding proteins highly sim- 
ilar to the messenger RNA (mRNA) cap- 
binding protein eIF4E. These may add com- 
plexity to regulation of cap-dependent 
translation, which is central to cellular 
growth control. Caenorhabditis elegans has 
three eIF4E isoforms, which were hypothe- 
sized to be necessary because trans-spliced 
mRNAs possess a different cap structure than 
do other mRNAs (57); however, Drosophila 
does not have trans-spliced mRNAs. The ac- 
tivity of eIF4E is regulated by an inhibitor 
protein, 4E-BP. The Drosophila genome con- 
tains only a single gene encoding 4E-BP; in 
contrast, mammals have at least three 4E-BP 
isoforms but perhaps fewer elF4E isoforms 
than do flies. Of the more than 200 RNA- 
binding proteins identified, the most frequent 
structural classes are RRM proteins (114), 
DEAD- or DExH-box helicases (58), and 
KH-domain proteins (31). This distribution is 
similar to that observed in the C. elegans 
genome. These structural motifs are some- 
times found in proteins for which experimen- 
tal evidence indicates a function in DNA, 
rather than RNA, binding. Overall, the trans- 



lational machinery appears well conserved 
throughout the eukaryotes. 

The process of nonsense-mediated decay 
(52), the accelerated decay of mRNAs that 
cannot be ti-anslated throughout their entire 
length, has been genetically characterized in 
yeast and C. elegans but not in Drosophila, We 
found homologs of UPFl/SMG-2, SMG-l, and 
SMG-7 in the Drosophila genome, indicating 
that this process is conserved in flies. 

Of particular interest are genes for compo- 
nents of the minor, or U12, spliceosome (53). 
Such introns are known in mammals, Drosoph- 
ila, and Arabidopsis, but not C. elegans. Using 
conser\'ative criteria (including a perfect match 
to the U 12 consensus 5' splice site for nucleo- 
tides 2 to 7, TATCCT), we found one intron 
that appears to be of the U 12 type per 1000 
genes. As expected, the minor spliceosome 
snRNAs U12, U4atac, and U6atac are present 
in the Drosophila genome. However, neither 
Ul 1 nor the Un -associated 35-kD protein {54) 
could be identified in the sequence. It is possi- 
ble that these components of the minor spliceo- 
some are less well conserved, or that the minor 
spliceosome in Drosophila does not contain 
them. 

Cytochrome P450. The cytochrome P450 
monooxygenases (CYPs) are a large and an- 
cient superfamily of proteins that carry out 
multiple reactions to enable organisms to rid 
themselves of foreign compounds. Human 
CYP2D6, for example, influences the metab- 
olism of beta blockers, antidepressants, anti- 
psychotics, and codeine, and insect CYPs 
fiinction in the synthesis or degradation of 
hormones and pheromones and in the metab- 
olism of natural and synthetic toxins, includ- 
ing insecticides (55). We found 90 P450 fly 
genes, of which four are pseudogenes, a fig- 
ure that is comparable to the 80 CYPs of C 
elegans. These 90 genes, some of which are 
clustered, are divided among 25 families, five 
of which are found in Lepidoptera, Co- 
leoptera, Hymenoptera, Orthoptera, and 
Isoptera. However, more than half of the 90 
genes belong to only two families, CYP4 and 
CYP6, the former family shared with verte- 
brates. CYP51, used in making cholesterol in 
animals and related molecules in plants and 
fungi, is absent from both the fly and worm 
genomes; it is well known that the fly must 
obtain cholesterol from its diet. A compre- 
hensive collection of phylogenetically di- 
verse CYP sequences is available (56). 

Solute transport. Solute transporters 
contribute to the most basic properties of 
living systems, such as establishment of cell 
potential or generation of ATP; in higher 
eukaryotes, these proteins help mediate ad- 
vanced functions such as behavior, learning, 
and memory. Hydropathy analyses predict 
that 20% of the gene products in Drosophila 
reside in cellular membranes, having four or 
more hydrophobic a helices (57). A consid- 



erable fraction of these proteins (657, or 4%) 
are dedicated to ion and metabolite move- 
ment. More than 80% of the annotated trans- 
porters are new to Drosophila and were iden- 
tified by similarity to proteins characterized 
in other eukaryotes. The largest families are 
sugar permeases, mitochondrial carrier pro- 
teins, and the ATP-binding cassette (ABC) 
transporters, with 97, 38, and 48 genes, re- 
spectively; these families are also the most 
common in yeast and C elegans (29). Also of 
note are three families of anion transporters 
that mediate flux of sulfate, inorganic phos- 
phate, and iodide. Na ' -anion transporters, 
with 17 members, are particularly abundant 
relative to worm and yeast. Although individ- 
ual members of these families have been 
investigated — for example, the mitochondrial 
carrier protein COLT required for gas-filling 
of the tracheal system (58) and the ABC 
transporters associated with eye pigment dis- 
tribution {59) — the variety and number of 
transporters within each family are impres- 
sive. These data lay the foundation for under- 
standing global transport processes critical to 
Drosophila physiology and development. 

Metabolic processes. The biosynthetic 
networks of the fly are remarkably complete 
compared to those of many different pro- 
karyotes and to yeast, in which key enzymes 
of various pathways may be missing (60). As 
in vertebrates, many fly enzymes are encoded 
by multiple genes. Two families are notewor- 
thy because of their size. The triacylglycerol 
lipases are encoded by 31 genes and merit 
consideration in investigations of lipolysis 
and energy storage and redistribution. In ad- 
dition, there are 32 genes encoding uridine 
dipho.sphate (UDP) glycosyltransferases, 
which participate in the production of sterol 
glycosides and in the biodegradation of hy- 
drophobic compounds. Several UDP glyco- 
sylu-ansferase genes are highly expressed in 
the antennae and may have roles in olfaction. 
In vertebrates, these enzymes are critical to 
drug clearance and detoxification (61). A ma- 
jor challenge will be to determine whether the 
number of these proteins present in the ge- 
nome is correlated with the importance and 
complexity of the regulatory events involved 
in any given enzymatic reaction. 

Iron (Fe) is both essential for and toxic 
to for all living things, and metazoan ani- 
mals use similar strategies for obtaining, 
transporting, storing, and excreting iron. 
Three findings from the analysis of the 
genome shed light on the underlying com- 
mon mechanisms that have escaped atten- 
tion in the past. First, a third ferritin gene 
has been found that probably encodes a 
subunit belonging to a cytosolic ferritin, the 
predominant type in vertebrates. This find- 
ing indicates that intracellular iron storage 
mechanisms in flies might be very similar 
to those in vertebrates. Subunits of the 
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predominant secreted ferritins in insects are 
encoded by two highly expressed autosom- 
al genes (62). Second, the dipteran trans- 
ferrins studied so far appear to play antibi- 
otic rather than iron-transport roles; one 
such transferrin was previously character- 
ized in Drosophila {63). We have now 
identified two additional transferrins. The 
conservation of iron-binding residues and 
COOH-terminal hydrophobic sequences in 
these new transferrins suggests that they 
are homologs of the human melanotrans- 
ferrin p97. The latter is anchored to the 
cells and mediates iron uptake indepen- 
dently from the main vertebrate pathway 
that involves serum transferrin and its re- 
ceptor (64). Third, proteins homologous to 
vertebrate transferrin receptors appear to be 
absent from the fly. Thus, the Drosophila 
homologs of the vertebrate melanotrans- 
ferrin could mediate the main insect path- 
way for cellular uptake of iron and possibly 
of other metal and nonmetal small ligands. 
This appears to be an ancestral mechanism, 
and the exploration of these findings should 
be crucial in bringing together what has 
seemed to be divergent iron homeostasis 
strategies in vertebrates and insects. 

This initial look at the genomic basis of 
the fly's fundamental biochemical pathways 
reveals that its biosynthetic networks are fair- 
ly consistent with those of wonn and human. 
On the other hand, there are a number of new 
findings. The large diversity of transcription 
factors, including several hundred zinc-finger 
proteins and novel homeodomain-containing 
proteins and nuclear hormone receptors, is 
likely related to the substantial regulatory 



Fig. 4. Coding content of the fly genome. Each 
predicted gene in the genome is depicted as a 
box color-coded by similarity to genes from 
mammals, C eiegans. and S. cerews/ae. A leg- 
end appears at the end of each chromosome 
arm describing the components of each panel 
In order from the top, they are (A) scale in 
megabases, (B) poiytene chromosome divi- 
sions, (C) CC content in a range from 25 to 
65%, (D) transposable elements, and genes on 
the (E) plus and (F) minus strands. The width of 
each gene element represents the total genom- 
ic length of the transcription unit The height of 
each gene element represents EST coverage: 
The shortest boxes have no EST matches, me- 
dium-size boxes have 1 to 12 EST matches, and 
the tallest boxes have 1 3 or more EST matches. 
The color code for sequence similarity appears 
on each side of the fold-out figure. The graphics 
for this figure were prepared using gff2ps (68). 
Each gene has been assigned a FlyBase identi- 
fier (FBgn) in addition to the Celera identifier 
(CT#). Access to supporting information on 
each gene is available through FlyBase at 
http://flybase.bio.indiana.edu. These data are 
also available through a graphical viewing tool 
at FlyBase (http://flybase.bio.indiana.edu) and 
Celera (www.celeraxom), with additional sup- 
porting information. 



complexity of the fly. In addition, many of 
the genes involved in core processes are sin- 
gle-copy genes and thus provide starting 
points for detailed studies of phenotype, free 
of the complications of genetically redundant 
relatives. 

Concluding Remarks 

Genome assembly relied on the use of several 
types of data, including clone-based se- 
quence, whole-genome sequence from librar- 
ies with three insert sizes, and a BAC-based 
STS content map. The combination of these 
resources resulted in a set of ordered contigs 
spanning nearly all of the euchromatic region 
on each chromosome arm. We are taking 
advantage of the cloned DNA available from 
both the clone-based and whole-genome sub- 
clones to fill the gaps between contigs; 331 
have been filled, and the remainder are in 
progress. 

It is useful to consider the relative con- 
tributions of the various data types to the 
finished product with respect to how simi- 
lar programs might be carried out in the 
future. The BAC end-sequences and STS 
content map provided the most informative 
long-range sequence-based information at 
the lowest cost. Both BAC ends and STS 
map were necessary to link scaffolds to 
chromosomal locations. A higher density of 
BAC end-sequences, from libraries pro- 
duced with a larger diversity of restriction 
enzymes (or even from a random-shear li- 
brary), would have resulted in larger scaf- 
folds at lower shotgun sequence coverage; 
this is our primary recommendation for 
future projects. Although the clone-based 
draft sequence data did not result in a mark- 
edly different extent of scaffold coverage 
compared to assembly without the clone- 
based data, they were useful in the resolu- 
tion of repeated sequences, particularly in 
the transition zones between euchromatin 
and centric heterochromatin. In terms of 
sequence coverage, adequate scaffold size 
was obtained with whole-genome sequence 
coverage as low as 6.5 X (7/). The assem- 
bly algorithm did not take any specific 
advantage of the fact that each draft se- 
quence read from a BAC clone came from 
a defined region of the genome. Adding 
this feature could mean that adequate ge- 
nome assembly could be obtained at lower 
whole-genome sequence coverage. Conti- 
guity and scaffold size continued to in- 
crease with increased coverage, and so a 
decision to proceed with additional se- 
quencing versus more directed gap closure 
should be driven by available resources. 

The assembled sequence has allowed a 
first look at the overall Drosophila genome 
structure. As previously suspected, there is 
no clear boundary between euchromatin 
and heterochromatin. Rather, over a region 



of ~ 1 Mb, there is a gradual increase in the 
density of transposable elements and other 
repeats, to the point that the sequence is 
nearly all repetitive. However, there are 
clearly genes wnthin heterochromatin, and 
we suspect that most of our 3.8 Mb of 
unmapped scaffolds represent such genes, 
both near the centromeres and on the Y 
chromosome (which is almost entirely het- 
erochromatic). Access to these sequences 
was an unexpected benefit of the WGS 
approach. 

The genome sequence and the set of 13,601 
predicted genes presented here are considered 
Release I . Both will evolve over time as addi- 
tional sequence gaps are closed, annotations are 
improved, cDNAs are sequenced, and genes are 
ftmctionally characterized. The diversity of pre- 
dicted genes and gene products will serve as the 
raw material for continued experimental w^oric 
aimed at unraveling the molecular mechanisms 
undedying development, behavior, aging, and 
many other processes common to metazoans 
for which Drosophila is such an excellent 
model. 
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Protein Kinases 6 




The cukaryotic protein kinases comprise one of the 
largest superfamilies of homologous proteins and 
genes. Within this family, there are now himdreds of 
different members whose sequences are known. Al- 
though there is a rich diversity of structures, regulation 
modes, and substrate specificities among the protein 
kinases, there are also common structural features. 
These conserved structural motifs provide dear indica- 
tions as to how these enzymes manage to transfer the 
y-phosphate of a purine nucleotide triphosphate to the 
hydroxyl groups of their protein substrates. The 
authors of this review have carried out a monumental 
task of analyzing and collating the amino add se- 
quences of all reported protein kinases and defining 
the conserved structural features that characterize the 
portion of these proteins that is responsible for their 
catalytic activity. Comparison of the sequences in the 
catalytic fiagment of the protein kinases has been used 
to arrange these enzymes in evolutionary trees that 
group subfiunilies of dosely related enzymes. It is com- 
forting that the structural relationships that emerge 
from these trees result in groupings that also reflect 
related fiinctions. The work presented in this review 
seems to be an excellent example of the type of analy- 
sis that will become indispensable in the coming years, 
as more and more sequence information become avail- 
able to biologists as a result of the genome projects. 



ABSTRACT The eukaiyotic protein kinases make up a 
large superEamily of homologous proteins. They are re- 
lated by virtue of their kinase domains (also known as 
catalytic domains)^ which consist of »250-300 amino acid 
residues. The kinase domains that define this group of 
enzymes contain 12 conserved subdomains that fold into 
a common catalytic core structure, ais revealed by the 
3«-dimensional structures of several protein-serine ki- 
nases* There are two main subdivisions within the super- 
family i the protein-serine/threonine kinases and the 
piotein-tyrosine kinases. A classification scheme can be 
founded on a kinase domain phylogeny, which reveals 
families of enzymes that have related sutetrate spedfid- 
ties and modes of regulation.— Hanks, S. K., Hunter, T. 
The eukaiyotic protein kinase superfunily: kinase (cata- 
fytic) domain structure and dassification. FASEB J. 9, 
576-596(1995) 

Key Words: protein-tyrosine kinase • protein-serine ki- 
nase • protein phosphorylation • AMP-dependerU protein kinase 

THE EUKARYOTIC PROTEIN KINASE SUPERFAMILY 

One of the largest known protein superfamilies is made 
up of protein kinases identified largely from eukaryotic 



sources. (The term superfamily will be used here to dis- 
tinguish this broad collection of enzymes fit>m smaller, 
more dosely related subsets that have been commonly 
referred to as &milies). These enzymes use die y-phos- 
phate of ATP (or OTP) to generate phosphate 
monoesters using protein alcohol groups (on Ser and 
Thr) antV^or protein phenolic groups (on Tyr) as phosr 
phate acceptors. The protein kinases are related by virtue 
of theu- homologous kinase domains (also known as cata- 
lytic domains), which consist of "250-500 amino add 
residues (reviewed in refs 1-3; and see bdow). During the 
past 15 years, previously imrecognized members of the 
eukaryotic protein kinase superratnily have been uncov- 
ered at an exponentially increasing rate and currentiy 
appear in the literature almost weekly. This pace of dis- 
covery can be attributed to the past development of mo- 
lecular doning and sequencing technologies and, more 
recendy, to the advent of the polymerase chain reaction 
(PGR),* which gunlitated the use of homolwy-based don- 
ing strategies. Consequentiy, about 200 different superb- 
mily members (products of distinct paralogous genes) 
had been recognized from mammalian sources alonel 
The prediction made several years ago (4) diat the mam- 
malian genome contains about 1000 protein kinase genes 
(roughly 1% of all genes) would still appear to be within 
reason, and may even be an underestimate (5). 

In addition to mammals and other vertebrates, cu- 
karyotic protein kinase superfemily members have been 
identified and characterized from a wide range of other 
animal phyla as well as fix>m plants, fimgi, and protozo- 
ans. Hence, the protein kinase progenitor gene can be 
traced back to a time before the evolutionary separation 
of the major eukaryotic kingdoms. The identification of 
eukaryotic-like protein kinase genes in prokaryotes (6, 7) 
raises die possibility tiiat the protein kinase progenitor 
gene might have arisen before the divergence of 

Erokaryotes and eukaryotes (see below). Studies of the 
udding and fission yeasts, Sacchanmyces cereuisiae and 
Schixosaccharomyces pombe, have been particularly fruitfid 
in tiie recognition of new protein kinases. In these genedr 



'This artide is based on an introductory chapter in the Protein 
Kinase FactsbooK edited by D. G. Hardie and S. K. Hanks, publish- 
ed in 1995 by Academic Press, London. 

*To whom correspondence and reprint requests should be 
addressed, at: Molecular Biology and Virology Laboratory, The 
Salk Institute, 10010 N. Torrey Pines Rd., La JoUa, CA 92037, 
USA ^ 

'Abbreviations: PGR. polymerase chain reaction; PKA-Ca, 
type a cAMP-dependent protein kinase catalytic subtmit; Gdk2, 
cyclin-dependent kinase 2; Erk2, p42 MAP kinase; APE, 
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cally tractable organisms, the powerful approach of mu- 
tant isolation and cloning by complementation has netted 
dozens of protein kinase genes required for numerous 
aspects of cell function (8). In many cases, vertebrate 
counterparts have now been found for these genes, lead- 
ing to a growing awareness that protein phosphorylation 
pathways that regulate basic aspects of ceU physiology 
have been maintained throughout the course of eu- 
karyotic evolution. 

Even though the overwhelming majority of protein Id- 
nases identified from eukaryotic sources belong to this 
superfiunily, a smsdl but growing number of such enzymes 
do not qualify as superfiunily members. Most of these are 
related to the prokaryotic protein-histidine kinase fomily 
(see below), which forms the sensor components of two- 
component signal transduction systems (9). Included in 
this category are a putotive ethylene receptor encoded by 
the flowering plant ETRl gene (10), the product of the 
budding yeast SUVl gene (11, 12) thought to be involved 
in relaying nutrient information to elements controlling 
cell growth and division, the mitochondrial 
branched-chain a-ketoadd dehydrogenase kinase (13), 
and the mitochondrial pyruvate dehydrogenase kinase 
(14). In prokaryotes, protein-histidine kinases phosphory- 
late aspartates in their target proteins, but except for the 
two dehydrogenase kinases that phosphorylate serine, the 
acceptor specificities of most of the eukaryotic protein 
kinases of this type are not known. In addition to these 
protein kinases, the Bcr protein encoded by the brtakpoint 
duster ngum gene involved in the Philadelphia chromo- 
some translocation (15) and the A6 kinase isolated by 
expression cloning using an anti-phosphotyrosine anti- 
body (16) have kinase domains unrelated to any known 
eukaryotic or prokaryotic kinase. In addition, true pro- 
tein-hbtidine kinases are known in eukaryotes. One such 
enzyme has been extensively characterized from budding 
yeast but not yet molecularly cloned (17), and so it is not 
dear whether this enzyme will belong to the protein ki- 
nase superfamily or use a novel structural principle for 
phosphotransfer. 

What about the prokaryotes? It has been known for 
years that protein phosphorylation events play key regu- 
latory roles in numerous bacterial cell processes induding 
chemotaxis, bacteriophage infection, nutrient uptake, 
and gene transcription (reviewed in refs 18, 19). The 
bacterial protein kinases have been divided into three 
general classe3 (20): 1) protein-histidine kinases such as 
those functioning in two-component sensory regulatory 
systems (strictiy speaking, these are protein-aspartyl ki- 
nases, because autophosphorylation on His is an interme- 
diary step in phosphotransfer to an aspartate in the 
response-regulator protein) (9); 2) phosphotransferases 
such as those of the phosphoenol pyruvate-dependent 
phosphotransferase system involved in supr uptake (21); 
and 3) protein-serine kinases such as isodtrate dehydro- 
genase kinase/phosphatase (22). Amino add sequences 
have been determined for members of each class, and all 
are unrelated to the eukaryotic protein kinase superfa- 
mily. 

Recentiy, however, true homologs of the eukaryotic 
protein kinases have been identified from two spedes of 
bacteria. Yersinia pseudotuberculosis (7) and Myxococcus xan- 
thus (6, 23). Are tiiese special cases, or die first examples 
of many such genes in prokaryotes? The eukaryotic-like 
protein kinase YpkA from the pathogenic enterobacteria 
Y. pseudotuberculosis is encoded by a plasmid essential for 



the virulence of this infectious organism. In addition to 
YpkA, at least two other proteins encoded by genes resid- 
ing on the virulence plasmid exhibit high similarity to 
eiULaryotic proteins. Tnus, it seems likely that the viru- 
lence plasmid genes were transduced from a eukaryotic 
host by horizontal transfer. The myxobacterium At wn- 
thus presents a different and perhs4>s more intriguing 
picture. Application of the PGR homology-based doning 
strategy revealed that at least e^^t genes encoding mem- 
bers of the eukaryotic protein kinase superfamily are pre- 
sent in the genome of this spedes (23). The myscobacteria 
are unusual prokaryotes in that they undergo a complex 
developmental cyde upon nutrient depletion, much like 
that of the eukaryotic slime mold Dictyostelium. Given that 
protein kinases are corrmiordy involved in regulating 
growth and differentiation of eukaryotic cells, it is attrac- 
tive to speculate that the eukaryotic-like proteiii kinases 
in Af. xanthus are spedficaJly involved in regulating their 
developmental cyde. Indeed, one of these kinases, Pknl, 
was shown to be required for proper fruiting body formar 
tion. The same could be true for the eukaryotic-like pro- 
tein kinase PknA from Anabena (24). In keeping with tiiis 
idea, neither the PGR approach applied to Escherichia coU 
(23) nor extensive sequencing of the £ coli genome (now 
30% complete) has yielded eukaryotic-like protein ki- 
nases. Hence, genes encoding members of die eukaryotic 
protein kinase superfamily may be present orJy in bacte- 
ria that can undergo a developmental cyde. However, 
unpublished reports of eukaryotic-like protein kinases in 
Streptomyces coelicolor, and in three spedes of Methanococ- 
cus, suggest that such genes are more widely expressed 
among prokaryotes, and potentially these genes represent 
the ancestors for the entire eukaryotic protein kinase su- 
perfamily. 

THE HOMOLOGOUS KINASE DOMAINS 

The kinase domains of eukaryotic protein kinases impart 
the catalytic activity. Three separate roles can be ascribed 
to the kinase domains: 1) binding and orientation of the 
ATP (or GTP) phosphate donor as a complex with diva- 
lent cation (usuaUy Mg^ or Mn**); 2) binding and orien- 
tation of the protein (or peptide) substrate; and 3) 
transfer of die y-phosphate from ATP (or GTP) to the 
acceptor hydroxyl residue (Ser, Thr, or Tyr) of the pro- 
tein substrate. 



Conserved features of primary structure 

The total number of distinct kinase domain amino add 
sequences available is now approaching 400 (Table 1). 
Induded in this total are the vertebrate enzymes encoded 
by distinct paralogous genes, their presumed fimctiorud 
homologs &X)m invertebrates and simpler organisms (en- 
coded by orthologous genes), and those identified finom 
lower organisms and plants for which vertebrate equiva- 
lents have not been fotmd. Conserved features of kinase 
domain primary structure have previously been identified 
through an inspection of multiple amino add sequence 
alignments (1-3) , The large number of sequences now 
available predudes showing an aligrunent containing all 
known kinase domains. Thus, in Fig. 1 only 60 different 
kinase domain sequences are aligned. These are drawn, 
however, from the widest possible sampling of the super- 
family and thus provide a good representation of the 
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AOG Group 

AGC-I. Cyclic nudeodde-regulated protein kinase fomily 
A. Cyclic AMP-dependent protein kinase (PKA) subfomily 
vertebfiUe: 

1. PKA-Coc 

2. PKA-Cp: 

3. PKACY 
DrosophUa melanogasUr: 

1. DmPKA-CO: 



PKA catalytic subunit, alphftf orm 
PKA catalytic subunit, beta-fonn 
PKA catalytic subunit, gamma-fonn 



2. DmPKA^l: 

3. DmPKA-C2: 
Caenorhabditis eUgans: 

l.CePKA: 
Sacchammyces cmnsiae: 

1. ScPKA-Tpkl: 
Schiicsaccharomyces pombe: 

1. SpPKAl: 
DidyosUlhm discoideum: 

l.DdPKA- 
Apiysia califomka: 

1. AplC* 

2. Sak: 



PKA catalytic subunit, CO form 
PKA catalytic subunit, CI form 
PKA catalytic subunit, C2 form 

PKA catalytic subunit homolog 

PKA catalytic subunit homolog. type 1 

PKA catalytic subunit homolog 

PKA catalytic subunit 

VVjJi catalytic subunit homolog 
**Spennatozoon-assodated kinase" 



B. Cyclic GMP-dependent protein kinase (PKG) subfamily 
vertebmU: 

1. PKG-I: PKG. type I 

♦ 2.PKG"II: PKG, type U 

DrosophUa meUmogasUr: 

1 . DmPK&G 1: PKG homolog, type 1 

2. DmPKG<;2: PKG homolog. type 2 

C. Others 
Dictyostelium discoideum: 

1. DdPKl: PKA homolog 

AGC-n. Diacylgtycerol-activateci/phospholipicklependent protein kinase C (PKC) &unity 

A. "Conventional" (Ca^^pendent) protein kinase C (cPKC) sub&mily 
verUlnate: 

1. cPKCou Protein Kinase C. alpha-form 

2. cPKCP: Protein Kinase C, beta^orm 

3. cPKC^: Protein Kinase C, gamma-form 
DrosophUa melanogasUr: 

1. DmPKC-53£bn PKC homolog expressed in brain, locus 53E 

2. DmPKC-53Eey: PKC homolog expressed in eye. locus 53E 
Apfysia caBfmka: 

1. Apl-I: PKC homolog, type I 

B. "Novel" (Ca -independent) Protein Kinase C (nPKC) sub&mily 



1. nPKC$: 

2. nPKCe: 

3. nPKCn: 

4. nPKCe: 
DmophUarMlatutffister 

1. DmPK098F: 
Aptfsia califomka: 

1. ApHI: 
CamofhabdUis elegans: 

1. CcPKC: 

♦ 2. CePKClB: 
Dtctyostelium discoideum: 

♦ l.DdMHCK: 
Saccharomyces cerevisiae: 

1. ScPKAl: 

♦ 2. ScPKA2: 
Schixosaccharomyces pombe: 

1. Pckl: 

2. Pck2: 

. "Atypical" Protein Kinase C (aPKC) subfamily 
vertebrate: 

1. aPKC^: Protein Kinase C, zeta-form 

♦ 2. aPKCi: Protein Kinase C. iota-form 

♦ 4. aPKQi: Protein Kinase C. mu-form 



Protein Kinase C, delta-form 
Protein Kinase C. epsilon-fonn 
Protein Kinase C, etarform 
Protein Kinase C. theta^orm 

PKC homolog, locus 98F 

PKC homolog, type 11 

PKC homolog, product of tporl gene 
PKC homolog expressed in neurons and intemeurons 

PKC homolog 

PKC homolog, product of PKCl gene 
PKC homolog, product of PKC2 gene 

"Pombe Clunase", type 1 
"Pombe Ckinase". type 2 



•More information about the individual protein kinases listed (including sequence references) can be obtained by contacting the authors orby 
consulting The Protein Kinase Factsbook (42). Protein kinases marked witii asterisks (♦) were not included in the phylogenetic analysis due to thdr 
recent discovery. In many instances new protein kinases were cloned by more tixan one group; in tiiese cases die most commonly accepted name is 
used for tiie enuy and alternative names are listed in parentheses after die entry. Protein kinase homologs from DNA viruses arc not mduded m 
this classification. 



578 Vol. 9 May 1995 



The FASEB Journal 



HANKS AND HUNTER 



SERIAL REVIEW 



Table 1. (centimud). 



D. Others 
yertetntte: 

* 1. PKN: Protein kinase with PKC-related catalytic domain 

AGC-ni. Related to PKA and PKC (RAC) family 

^'''^^^RACii: RAC. alpha^orm; cdlular hom<^ of v-Aki oncoprotein 

2. RAGp: RAC, betaionn 
Dro$ophila: 

1. DmRAC: RAC homolog 
Caenafhabditis eUgcms: 

♦ 1. ORAC: RAC homolog 

ACG-IV. Family of kinasese that phosphorylate G proteuvcouplcd receptors 

1. pARKl: ^adrenergic receptor kinase, type 1 

2! PARK2: P^drcnergic receptor kinase, type 2 

$.RhK: Rhodopsin kinase 

• 4.1X1 1: Gi>rotcin<oupled receptor kinase homolog 

♦ 5.GRK5: Gprotein<oupled receptor kinase, type 5 

• 6. GRK6: G^^roieiiKoupled receptor kinase, type 6 

1. DmGPRKl: Drosophila Gi)roiein<oupled receptor kinase, type 1 

2. DmGPRK2: Drosophila G^rotein<oupled receptor kinase, type 2 

AGG-V. Family of budding yeast AGCrelated kinases 

I 5^9. Suppressor of defects m cAMP effector pathway 

2. Ykr2: AGOrdated kinase 

S. Ypklf AGOrdated kinase 

AGC-VI. Family of kinases that phosphorylate ribosomal S6 protein 

I S5I^ 70 kDa S6 kinase with single catalytic domam 

2! RSKl(Nt): 90 kDA S6 kinase, type 1 

3 RSK2(Nt)' W kDA S6 kinase, type 2 ^„ \^ 

[Note: The RSK enzymes have two distinct catalytic domains. The Nt-do^ 

Ct-domain is most dosdy related to phosphorylase kinase] 

AGG-Vn. Budding yeast Dbf?/20 Family 

Saccharomyces cenvisiae: _ . . . 

1, 5bf2: Product of gene periodically expressed m cell cycle 

2. Dbf20: Close relative of DBF2 not under cell cyde control 

AG-Vm. Flowering plant "PVPKl Family" of protein kinase homologs 
Pfnium Angiospermophpa (Kmgdam PknUae): 

1 . PvKl : Bean protein kinase homolog 

2. OsGl 1 A: Rice protein kinase homolog 

3. ZmPPK: Maize protdn kinase homolog 

4. AtPK5: Aiabidopsis protein kinase homolog 

5. AtPK7: Arabidopsis protein kinase homolog 
e! AtPK64: Arabidopsis protein kinase homolog 
7. PsPKS: Pea protein kinase homolog 

Other AGC-related kinases 

1. DMPK: "Myotonic Dystrophy Protein Kinase" 

%SA: "Scrum and g^ucocortocoid regulated kinase" 

* 3! Mast205: Spermatid "Microtubule^ssodated serine/threonine kinase" 

Newrospora crassa: . , ^ , , . , 

1 NcCot-1 : Product of gene required for normal coloiual growth 

DUhosieiium discoideum: 

1 . Ddk2: Product of developmentally-regulated gene 

S<udumMyces cenvisiae: 

1. ScSpkl: Dual^pcdfidty kinase 

Phflum AngiospermophyUi (Kingdom Planiae): 

* 1. Atpkl: Arabidopsis protein kinase 

CaMK Group 

CaMK-I. Family of kinases regulated by CaVCataaodulin, and dose relauvcs 
A. Subfemily induding "Multifunctional" CaVCahnodulin Kinases (CaMKs) 
virtetrate: 

1. CaMKl: CaMK,typeI 

2. CaMK2a: CaMK. type H, alpha subunit 

3. CaMK2P: CaMK, type II, beta subunit 

4. CaMKSy: CaMK, type 11. gamma subunit 

5. CaMK2& CaMK, type 11, ddta subunit 

* 6. EF2K: Elongation Factor-2 Kinase or CaMK type III 
7!caMK4: CaMK,typeIV 
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Tabic 1. (continued). 



DfosophUa melanogaster: 

1. DmCaMK2: CaMK-U homolog 
Sacchammyces cerevisiae: 

1. ScCaMK2-l: CaMK-H homolog, product of CMKl gene 

2. ScCaMK2-2: CaMK-II homology produa of CMK2 gene 
AspngiUus nidulans: 

1 . AnCaMK2: CaMK4I homolog 

B. Subfamily including phosphoiylase kinases 
vtftibtcti: 

1 . PhK-^: Skeletal muscle phosphorylase kinase catalytic subunit 

2. PhK-yi: Male germ cell phosphoiylase kinase catalytic subunit 

3. RSKl(Ct): 90 kDa S6 kinase, type 1; Gtcrminal catalytic domain 

4. RSK2(Ct): 90 kDa S6 kinase, type 2; C-tcrminal catalytic domain 

C. Subfamily including myosin light chain kinases 
verUbrate: 

1 . skMLCK: Skeletal muscle MLCK (rabbit) 

2. smMLCK: Smooth muscle MLCK (rabbit) 

3. Titin: Huge protein implicated in skeletal musde development 
Caenorhabditis eUgans: 

1 . Twn: Twiichin" protein involved in muscle contraction or development 
Dictyostelium discoideum: 

1 . DdMLCK: Slime mold myosin U j^t chain kinase 

D. Subfamily of plant kinases with intrinsic calmodulin4ike domain 
Phylum Angiospennopkyta {Kingdom Plantae): 

1 . CDPK: Soybean Ca**-regulated kinase %vith intrinsic C^4ike domain 

2. AtAKl: Arabidopsis CDPK homolog 

* 3. OsSpk: Rice CDPK homolog 

• 4. DcPk431: Carrot CDPK homolog 
£. Subfamily of plant kinases with highly acidic domain 

Phylum Angiospennopkyta (Kingdom Plantae): 

* 1. ASKl : Arabidopsis protein kinase homolog with highly addic idomain 

• 2!aSK2: Arabidopsis protdn kinase homolog with highly addic domain 
F. Other CaMK-related kinases 

vertebrate: 

1. PskHl: Putative protein^erine kinase 

• 2. MAPKAP2: "MAP Kinase-Activated Protein Kinase 2" 
Saccharomyces cerevisiae: 

1. Mre4: Protein required for meiotic recombination 

* 2. Dunl: Protein required for DNA damagenndudble gene expression 

♦ 3. Rckl : "Radiation sensitivity complementing kinase, type 1" 

* 4. Rck2: "Radiation sensitivity complementing kinase, type 2" 

CaMK-lI. Snn/AMPK family 
vertebrate: 

♦ 1: AMPK: "AMP-Activatcd Protein Kinase" 

2: p78: Protein lost in carcinomas of human pancreas 
Saccharomyces cerevisiae: 

1. Snfl: Kinase essential for release from glucose repression 

2. Kinl: Protein kinase with N-terminal catalytic domain 

3. Kin2: Close relative of KINl 

4. Ycl24: Protein kinase homolog on chromosome III 

• 5. Ycl453: Protein kinase homolog on chromosome XI 
Schizosaccharomyces pombe: 

1 . SpKinl : Product of gene important for growth polarity 

2. Nim 1 : Inducer of mitosis 

Phyhtm Angiospermophyta (Kingdom Plantae): . 

1 . PSnfl-RKIN 1 : Rye putative protein kinase Uiat complements yeast snfl polarity 

2. PSnfl-AKINlO: Arabidopsis putative protein kinase related to SNFl 

3. PSnfl-BKIN 12: Barley protein related to SNFl 

♦ 4. PKABAl: Wheat kinase induced by absdsic add 

♦ 5. WPK4: Wheat kinase homolog regulated by light and nutrients 

• 6. NPK5: Tobacco Snfl homolog, activates SUC2 gene expression 

Other CaMK Group Kinases 

Plasmodium falciparum (malarial parasiU): 

1. PfCPK: Ca^-regulated kinase with intrinsic CaM-like domain 

2. PfPK2: Putative protein kinase 

C-M-OnC Group 

CMGC-1. Family of cyclin-dependent kinases (CDKs) and other dose relatives 

^'^L Cdc2: Inducer of mitosb; functional homolog of yeast cdc2VCDC28 kinases (Cdkl) 

2. Cdk2: Type 2 cydin-dcpendent kinase 

3. Cdk3: Type 3 cyclin-dependent kinase 

4. Cdk4: Type 4 cydin-dependent kinase 

5. Cdk5: Type 5 cyclin^iepcndent kinase 
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6. Cdk6: 

7. PCTAIREl: 

8. PCTAIRE2: 

9. PCTAIRE3: 

10. Mol5: 
DrosophUa meUmogasier: 

l.DmCdcS: 

2.I>mCdc2c 
DuiyosUiium discoidnnm: 

l.DdCdc2: 

S.DdPRK: 
AspergiUus nidukms: 

l.NIMX£dc2: 
Plasmodium falciparum: 

l.PfPK5: 
Entamoeba histolytica: 

l.EhC2R: 
Cfithidiafasdculata: 

l.CfCdc2R: 
Leishmarua maaama: 

• l.LmCRKl: 
Saechormfus cemrisiae: 

1. Cdc28: 

2. Pho85: 

3. Kin28: 
SchtzosaccJuiWMyco pombe: 

l.SpCdcS: 
Histoplasma capsulatim: 

♦ 1. HcCdc2: 



Type 6 cydiiMlependent kinase 
Cdc2-related protein 
Cdc2-relaced protein 
Olc2-rclated protein 

"Cdk-activating kinase"; Negative regulator of metosts (CAK) 

Functional homolog of yeast cdc2VCDC28 kinases 
Cdc2-cognate protein; Cdk2 homolog 

Functional homolog of yeast cdc2 VCDC28 kinases 
'*Cdc2-rebted PCTAIRE Kinase' 

Cdc2-related gene produa 

Cdc2-relatcd protein from human malarial parasite 

Cdc2-rclated protein 

Cdc2-related protein 

-Cd£2Related Kinase" 

"CelWivision^ydc" gene product 

Negative regulator of the PHO system and cell cycle regulator 
CDC28-related protein 

"CelKdivi9ion<ycle" gene produa 



Phylum Attgiospemopkyta (Kingdom Plantae): 



Cdc2 homolog from dimorphic fungus 



l.Pcdc2; 

• 2.MsCdc2B: 
3. OsC2R: 

CMGOU. Erk(MAP kinase) feunily 
vertebrate: 

1. Erkl: 

2. £ik2: 

3. £rk3: 

4. pGSMAPR: 
5.SAPK^ 

6. SAPK-P: 

7. SAPK-t<Jnkl: 

8. p38: 
DrosophUa melanoguter 

l.DmErliA: 
Caerurhabditis elegans: 

• 1. Surl: 
Sacdtaromyees cerrevisiae: 

1. Kssl: 

2. Fus3: 

3. Slt2: 

• 4. Hogl: 
Sehizosaccharomyces pombe: 

l.Spkl: 

Phylum Deuteromyeoia (Kingdom Fungi): 
l.CaErkl: 



Flowering plant Cdc2 homolog othat complements yeast mutants 
Alfalfa Cdc2 cognate gene products that complements Gl/S transition 
More distantly related Cdc2 homolog from rice 



"Extracellular signal-regulated kinase", type 1 (p44 MAP kinase) 
"Extracellular signal-regulated kinase", type 2 (p42 MAP kinase) 
Somewhat distant relative of the ErVMAP kinases 
Another more distant relative of the Erk/MAP kinases 
"StresMctivated protein kinase, type alpha" 0NK2) 
"Stresfractivated protein kinase, type beta" 

"Stres»activated protein kinase, type gamma" or "Jun N-tenninal Kinase" 
HOGl-related protein (MPK2) 

Homolog of Erk/MAP kinases; product of rolled gene 

Erk/MAP kinase homolog 

Suppressor of sst2 mutant, overcomes growth arrest 
Product of gene required for growth and mating 
Product of gene complementing lyt2 mutants (MPKl) 
Product of gene required for osmoregulation 

Produa of gene that confers drug resistance to staurosporine, a PK inhibitor 



Protein that interferes vnth mating factor-induced cell cyde arrest 
Trypanosoma brvcei (Phylum Zoomastigina^ Kingdom Protoctista): 
♦ l.KFRl; "KSSl- and FUS3Telated" gene produa 

Phylum Angiospermophyta (Kingdom Plantae): * w j • \ 

1 . PErk: Flowering plant Erk/MAP kinase homologs (7 disdna homologs identified in Arabidopsis) 

CMGOIIL Glycogen synthase kinase 3 (GSK3) femily 
vertebrate: 



Glycogen synthase kinase 3, a-form 
Glycogen synthase kiiuue 3, ^orm 

Produa of shaggy/xeste^ite 3 gene 

"Meiosis and centromere regulatory kinase" 
Protein closely related to MCKl 
Dosage suppressor of mckl mutant 



1. GSK3CC 

2. GSK3p: 
Drosophila meUmogaster 

l.Sgg: 
Saaharomyces cerevisiae: 
1. Mckl: 

* 2. ScGSK3 

* 3. Mdsl: 
DictyosleUum discoideum: 

* l.DdGSK3: Glycogen synthase kinase 3 homolog 
Phylum Angiospermopkyta (Kingdom Plantae): 

* 1. /^K-oc: "Arabidopsis shaggy-related protein kinase", type alpha 

* 2. ASK-^ "Arabidopsis shaggy-related protein kinase", type gamma 



EUKARYOTIC PROTEIN KINASE SUPERFAMILY 



581 



SERIAL REVIEW 



Table 1. (continued). 



vertebmte: 

l.CK2a: 

1. CK2o': 
Drosophila metanogaster: 

1. DmCK2: 
Caerunhabditis elegant: 

l.CeCK2: 
TheiUria parva (a protozoan parasite): 

l.TpCK2: 
Diciyostelium discoideum: 

1. I>dCK2: 
Saecharomyces cerevisiae: 

l.ScCK2a: 

2. ScCKSa': 
Schizosaccharomyces pombe: 
♦ 1. SpCkal: 

Phylum Angiospermophyta (Kingdom Plantae): 



Casein kinase U» alpha subunit 
Casein kinase H, a4>hai>rime subunit 

Casein kinase II homolog 

Casein kinase 11 homolog 

Casein kinase U otsubunit homolog 

Casein kinase 11. a-«ubunit 

Casein kinase 11, alpha subunit 
Casein kinase 11, alpha^rime subunit 

Casein kinase II, OMbunit homolog (Orb5) 



I.ZmCK2 

CMGC-IV. Ok fomUy 
vertebmte: 

h Clk: 

• 2. Srpkl: 

3. PskOl: 

4. PskH2: 
Drosophila melanogaster: 

• l.Etoa: 
Saecharomyces cerevisiae: 

1. Yakl: 

2. Knsl: 
Schixosaccharomyces pombe: 

l.Dskl: 
» 2.Prp4: 

Other CMGC Group kinases 
vertebmte: 

1. Mak: 

2. Chcd: 

5. PITSLRE: 
4.KK1ALRE: 

• 5. PITALRE: 

• 6. PISSLRE: 
Saecharomyces cerevisiae: 

1. Smcl: 
2.Sgvl: 

3. Ctkl: 



Phylum Angiospermophyta (Kingdom Plantae): 



Flowering plant casein kinase 11, Orsubunit homolog 



"Cdc-likc kinase" 

Kinase that regulates intracellular localiiation of splicing factors 
Putative protein kinase 
Putative protein kinase 

Kinase encoded by "Darkener of Apricot" locus 

Suppressor of RAS mutant 
Nonessential protein kinase homolog 

Disl-suppressing protein kinase implicated in mitotic control 
Pre-mRN A processing gene product; lacks subdomains X-XI 



"Male germ celKassodated kinase" 
"Chotinesterase-rdated cell division controller" 
Galactosyltransfcras&assodated kinase 
Cdc2-relaied protein 
Cdc2-related kinase 
Cdc2-rebted kinase 

Product of gene essential for start of msiosis 

Kinase required for G^roteiiwnediated adaptive response to pheromone 
Product of gene required for normal growth 



l.Mhk: 



Aiabidopus thaliana "Mak homologous kinase" 



Conventioiua Protein-Tyrotine Kinase Group (I-X: Non-membrane^panning; Xl-XXIII: Membrane^panning) 
PTK-I. Src femily 



vertebrate: 

l.Src: 

2. Yes: 

3. Yrk: 

4. Fyn: 

5. Pgr 

6. Lyn: 

7. Hck: 

8. Uk: 

9. BIk: 

• 10. Frk: 

• ll.Rak: 

• 12. Fyk: 
Drosophila melanogaster: 

1. DmSrc: 

Dugesiai (Girardia) tigrina (Phylum Platyhelminthes): 

• l.DtSpk-1: "Src4ike planarian kinase" 
Hydm vulgaris (Phylum Cnidaria): 

1 . Sik: Src-relatcd protein 

Spongilla lacustris (Phylum Porijem): 

I. Srkl-4: Four disdna Src-relatcd kinases 



Cellular homolog of Rous sarcoma virus oncoprotein 
Cellular homolog of Yamaguchi 73 sarcoma virus oncoprotein 
Yes^dated kinase 
Protein related to Pgr and Yes 

Cellular homolog of Cardner-Rasheed sarcoma virus oncoprotein 
Protein related to Pgr and Yes 
Hematopoietic cell proteirKyrosine kinase 
Lymphoid T-<eD protciniyrosine kinase 
Lymphoid B<cU proteiivtyrosine kinase 
Fyn-related kinase 
STK-related kinase 

"Fyn and Yes-related kinase" from electric ray 
Src homolog, polytenc locus 64B 



FTK Il. Bit famUy 
vertebrate: 
• 1. Brk: 



Protein-cyrosine kinase expressed in human breast tumors 
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FTK-m. Tec family 
vertetmte: 

1. Tec: 

2. Emt: 

3. Bck: 

* 4. Txk: 
DmsophiiameUmogcater 

1. DmTec 
FTK-IV.Csk family 
vtrtebmU: 

1. Csk: 

* 2.MatR: 
PTK-V. Fc8(Fps) family 

verteifmU: 

\. Fes/Fps: 

2. Fcr 
Dmophih melatwgaster 

1. DmFen 

PTK-VI. Abl family 
vertdnoie: 

1. Abl: 

2. AIK: 
Drosi^ula meianogaster 

l.DmAbl: 
CamorhabdUis eUgans: 

1. CeAbl: 

PTK-VIL Syk/Zap70 family 
verteimiU: 

l.Syk: 

2. Zap70: 

/]F|K<ra vulgaris (Phylum Cnidaria): 

• 1. Htkl6: 

FrK.Vra.Jak family 
vertebrate: 

1. Tyk2: 

2. Jakl: 

3. Jak2: 

• 4.Jak3: 
Drosophiia melaru^ter 

• 1. Hop: 

PTK-DCAck 

veftebmte: 

• l.Ack: 

PTK-X-Fak 

vertebraU: 

l.Fak: 



"Tyrosine kinase expressed in hepatocellular carcinoma* 
"E3q>msed mainly in T^ells' kinase (Ilk. Tsk) 
"Bruton's agammaglobuUnaemia tyrosine kinase" (Emb) 
Tec-related proteiiKyrosinc kinase 

Tec homolog, polytcne locus 28C 



"C terminal Src Kinase"; negative regulator of Src 
"Megakaryocytc^assodated Tyr-kinase* (Hyl. Lsk, Ctk, Ntk) 



Cellular homolog of feline and avian sarcoma viruses 
"Fea/Fps-related' kinase 

Fer*rc]atcd protein 



Cellular homolog of Abdson murine leukemia virus 
"Abl-rdatcd gene" product 

Abl-related protein 

Nematode Abl-related protein 



"Spleen tyrosine kinase" 

T-cell receptor "zeta chain-associated protein of 70 kDa" 
Syk/Zap70-rclated 



Transducer of interferon a/^ signals 
"Janus kinase", type 1 
"Janus kinase", type 2 
"Janus kinase", type 3 

Product of hopscotch gene required for establishing segmental body plan 



"CDC42Hs-assodated kinase" 



"Focal adhesion kinase" 



FTK*XI. Epidermal growth faaor receptor family 
verUbfute: 

1. EGFR: Epidermal growth factor receptor 

2. ErbB2: CcU homolog of oncogene activated in ENU-induced rat neuroblastoma (Neu, HERZ) 

3. Eri>B3: Receptor tyrosine kinase related to EGFR (HER3) 

4. ErbB4: Receptor tyrosine kinase related to EGFR (Tyro2) 
Drosophiia melanogaster. 

1. DER: Homolog of EOF receptor 

Caeno/fhabditis eUgaru: 

1. LET-23: Pitxiua of gene required for normal vulval development 

Schistosoma mansoni (Fhybim Hatyheimmthes): 

l.SER: EOF receptor homolog 

PTK-XII. EpVEVE<* receptor family 
vertdnraie: 

1. Eph: 

2. Eck: 

3. Eek: 

4. Hck: 
S.Sek: 
6. Elk: 

• 7. Hek2: 

• 8.Htk: 
9. Cek5/Nuk: 

• 10. Ehkl: 

• n.Ehk2: 

• 12. Mykl: 



Kinase detected in "ciythropoeitin-produdng hepatoma" 

"Epithelial cell linase" 

Eplv^Elk-related protein-tyrosine kinase 

EpVElk related protein^yrosine kinase (Cek4) 

"Scgmcntally-exprcsscd kinase" 

"£ph4ike kinase" detected in brain 

"Human embryo kinase" type 2 (CeklO) 

"Hepatoma transmembrane kinase" 

"Chicken embryo kinase 5 "/"Neural kinase" 

"E^h homology kinascsl" (Cek7) 

"Eph homology kinase-2" 

"Mammary-derived tyrosine kinase, type 1" 
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13. Myk2: 

14. Cck9: 

15. Pag: 

16. Rtkl: 

17. Rtk2: 

18. Rtk3: 

FTK-XUI. Axl fomily 
vertebraU: 

1. Axl: 

2. Eyk: 

• 3. Bit/Sky/WRse: 

FTK-XIV. Tic/Tek family 
vertetfmU: 

1. Tic: 

2. Tck: 



"MammaiyKicrTved tyrosine kinase, type 2* 
"Chicken embryo kinase 9" 

"Pagliaccto" Xcnopus protein expression in neural crest and neural tissues 
Zebrafish ^Iv/Elk-related protein^yrosine kinase 
Zebrafish ^Iv^-v^clated protein-^rosine kinase 
Zebrafish EpIv^EUiHrelated protdik^rosine kinase 



"Anexelekto* (Gr. "HmcontroUed") tyrosine kinase (UFO, Ark) 
Cellular homolog of RPL30 avian oncoprotein (c-Ryk) 

"Brain tyrosine kinase"/ "Sea related protein tyrosine kinase"/"Tyrosine kinase with Ig4ike 
and FN-in-Uke doiiiairu"/"Receptor sectaris" (Tyro3) 



"Tyrosine kinase with Ig and EGF homology" 
"Tunica interna endothelial cell kinase" (TIES) 



PTK-XV. Platelet-derived growth faaor receptor family 

A. Subfomily witih 5 Ig-Uke extracellular domains 
vertebnUe: 

1 . PDGFRo: Platelet-derived growth factor receptor, type alpha 

2. PDGFRP: Platelet-<lerived growth factor receptor, type beta 

3. CSFIR: Colony4timulating foctor-1 receptor (c-Fms) 

4. Kit: Steel growth fector receptor 

5. Flk2: "Fetal Uvcrkinase-2"(Flt3) 

B. Subfemily with 7 Ig-like extracellular domains 
vertebmte: 

1. Fltl: "FmsJike tyrosine kinase", type 1 

2. Flt4: "Fms4ike tyrosine kinase", type 4 

3. Flkl: "Fetal Uver kinase-1" (KDR) 

PTK-XVl. Fibroblast growth foaor receptor family 



vertilnaU: 

1. FGFRl: 

2. FGFR2: 

3. FGFR3: 

4. FGFR4: 
Drxtsophila melanogaster 

1. DmFGFRl: 

♦ 2. DmFGFR2: 

PTK-XVIL Insulin receptor fomily 
vertebmU: 

l.InsR: 

2. IGFIR: 

3. IRR: 
Drosophila melanogaster: 

1. DmInsR: 

PTK-XVIIL Ltk/Alk femily 
vertebrate: 

l.Ltk: 

♦ 2.Alk: 

FTK-XDC. RoV'Sev family 
vertebmte: 

1. Ros: 
Drosophiia melanogaster 

l.Sev: 

PTK-XX. Trk/Ror famUy 
vertebrate: 

1. Trk: 

2. TrkB: 

3. TrkC: 

4. Rorl: 

5. Ror2: 

6. TcRTK: 
Drosophila melanogaster 

♦ 1. Dror 

PTK-XXI. Ddr/Tkt family 

♦ l.Ddn 

♦ 2.Tkt: 



Fibroblast growth factor receptor, type 1 (Fig, Cekl) 
Fibroblast growth factor receptor, type 2 (Ml, K-SAM, Cek3) 
Fibroblast growth factor receptor, type 3 
Fibroblast growth factor receptor, type 4 

Fibroblast growth factor receptor homolog, type 1 
Fibroblast growth factor receptor homolog, type 2 



Iiuulin receptor 

Insulin-like growth factor receptor 
Insulin receptor-related protein 

Homolog of insulin receptor 



"Leukocyte tyrosine kinase 
"Anaplastic lymphoma kinase 



Cellular homolog of UR2 avian sarcoma virus oncoprotein 

Product of sevenless gene required for R7 photoreceptor cell devdopmcni 



High molecular weight nerve growth factor receptor 

Receptor for nraiiKderived neurotrophic factor and neurotrophin^S 

Trk-related protein; receptor for neurotro|^un-3 

"Ror^ putative receptor, type 1 

"Ror* putative receptor, type 2 

Trk-related receptor (electric ray) 

Putative neurotrophic receptor 

"Discoidin Domain Receptor" (TrkE. CAK, NEP, Ptk3) 
"Tyrosine Kinase Related to Trk" (Tyro 1 0) 
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FTR-XXn. Hcpatocyte growth foctor receptor &mily 
verUbrate: 

1. HGFR: Hepaiocyte growth factor receptor (MET) 

2. Sea: Cellular homolog of S13 avian erythroleukemia virus oncoprotein 

3. Ron: "Recepteur d'Originc Nantatse" 

• 4. Stk: "Stem celKderived tyrosine kinase" 

FTK-XXm. Nematode Kinl5/16 fomily 
Caenorbabditis eiegans: 

1. CeKinlS: 

2. CeKinl6: 



FTK expressed during hypodermal development 
FTK expressed during hypodennal development 



Other membrane^panning protein-tyrosine kinases (each with no dose relatives) 



vertebrate: 

1. Ret: 

2. Klg: 

♦ 3. Nyk/Ryk: 
DfosophUa mdarutgaster: 

1. Torso: 

2. DmTrk: 

Marine sponge (Geodia cydonnm): 

♦ l.GCTK: 



Normal homolog of oncoprotein activated by recombination 
"Kinase^ike gene" product 

"Novel tyrosine kinase-related protein" (VIK, Mit. Nbtkl) 

Product of ton& gene required for embryonic anterior/posterior determination 
Distant relative of the mammalian txk gene 

Putative receptor PTK 



Otficr protein 

04. Polo&mily 
vertebrate: 

1. Plk: 

2. Snk: 

• 3.Sak: 
DrosophUa mdanogaster. 

1. Polo: 
Stuehafomycei cerevistoe: 
l.CdcS: 

(MI. MEK/STE7 &mity 
vertebrate: 

1. MERl: 

2. MEK2: 
Drosophila melanogaster: 

1. Dsorl: 
Socduttomyces cerevistoe: 

1. Ste7: 

2. Pbs2: 

3. Mkkl: 

4. Mkk2: 
Sdaiosaeeharomyces ptnnbe: 

1. Byrl: 

2. Wisl: 

(MIL MEKK/Stell family 
vertebrate: 

• l.MEKK: 
Saccharomyces cerevisiae: 

l.Stell: 
2. Bckl: 
SditLOsacchnTOMyces ponbe: 
l.Byr2: 



(not falling into m syor groups) 



"Polo4ike kinase" 
"Serum-tndudble kinase" 

Polorelated kinase isolated in screen for genes regulating sialylation 
Protein kinase homolog required for mitosis 
Product of gene required for ceU cyde progression 



-MAP ERK Kinase", type 1 
"MAP ERK KinaseMype 2 



Kinase required for haploid-spedfic gene expression 

Kinase required for antibiotic drug resistance 

"MAP Kinase Kinase", type I (suppresses lysis defea of pkcl mutant) 

"MAP Kinase Kinase", type 2 (suppresses lysb defect of pkcl mutant) 

Kinase that suppresses rasl-mutant sporulation defea 

Suppressor of cdc phenotype in triple mutant cde2^/v>eel/vnnl strains 



"MEK Kinase" 

Protein required for cell-typespecific transcription 
"Bypass of C kinase" kinase 



PhyUm Angiospermophyta (Kingdom Piantae): 



Product of gene required for pheromone signal transduction 



1. NPKl: 

(MV. Pak/Ste20 family 
vertebrate: 

♦ l.Pak: 
Saccharxmyces cerevisiae: 

l.Ste20: 

O-V. NimA family 
vertebrate: 

1. Nekl: 

♦ 2. Nek2: 

♦ 3. Nek3: 

♦ 4. Nrk2: 

♦ 5. Stkl: 
Aspergiltus nidutans: 

1. NIMA: 
Drosophila meUmogaster 
1. Fused: 



Floweriitg plant (tobacco) homolog of Bckl 

"p21-(Cdc4?/Rac) activated kinase" 

Product of gene required for pheromone response 

NimA-related kinase 
NimA-related kinase (Nlkl) 
NimA-related kinase 
NimA^lated kinase 
NimA-related kinase 

CeU cyde control protein kinase 

Product of gene required for segment polarity 
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Trypanosoma brucei (Phylum ZoamasHgjma, Kingdom FrotoOista): 

1 . NrkA: Trypanosomc protein kinase related to NimA 



Sauharomyces cerevisaie: 
1. KinS: 

OVl. weel/mikl family 
vertebmU: 

1. WeclHu: 
Satchanmyces cenvisiae: 
* 1. Swel: 
Schizosaccharomyces pombe: 

1. SpWccl: 

2. Mikl: 



Putative protein kinase 

Gene product able to complement S. pombe weel mutant 

Wecl homolog from budding yeast 

"Wee" size at diviuon kinase; Cdc2 negative regulator 
"Mitosis inhibitory kinase", negative regulator of Cdc2 



OVII. Family of kinases involved in transbtional control 
vertebrate: 



1. HRI: 

2. PKR: 
Saccharomyces cerevisiae: 

l.Gcn2: 

0-\aiI.Raf family 
xfertebrate: 

1. Raf-1: 

2. A-Raf: 

3. B-Raf: 
Dwsophiia melanogaster 

I. DmRaf: 
Caenorhabditis eleg^ns: 
l.CeRaf: 



"Heme-regulated eukaryotic initiation &ctor 2a kinase* 
'Double^tranded RNA-depcndent kinase" (Tik) 

Protein required for translational derepression 



Cellular homolog of retroviral oncogene produa 
Oncogenic protein dosely related to c-Raf 
Oncogenic protein closely related to c-Raf 



Raf homolog 



Raf homolc^, produa of lin-43 gene required for vuhnd diflFcrendation 
Phylum Angiospermophyta (Kingdom Plantae): 

1. Ctrl: Negative regulator of ethylene response pathway 



O-DC ActivirVTGFp receptor family 

A. Sub&mily of type I receptors 
vertebrate: 

1. ActR-I: 

• 2.TSR1: 

• 3. TCFpRI: 

* 4. ActR IB: 

• 5. BRK-1: 

♦ 6. ALK^: 
brosophiia melanogaster: 

* 1. DmAtr-I: 

♦ 2.DmSax: 

B. Subfamily of type II receptors 
vertebrate: 

1. ActRII: 

2. ActRIIB: 

3. TGFPRII: 

♦ 4. C14: 
Drosophila melanogaster. 

• 1. DmAtr-H: 
Caenorhabditis elegans: 

• 1. DAF-4: 

C. Others 
Caenorhabditis elegans: 

\. DAF-1: 



Type I receptor for acthrin and TGF^ (TskTL, SKRl, ALK-2) 

Type I receptor for activin and TGFC-P (ALK-l) 

Type 1 receptor TGF- (ALR-5) 

Type I receptor for activin (ALK-4) 

Type i receptor for BMP-2 and BMP-4 (AUK S) 

"Activin receptor-like kinase", type 6 

Type I activin receptor homolog 
Produa of saxophone gene 



Type 11 receptor for activin 

Type 11 receptor for activin 

Type 11 receptor TGF-P 

Putative receptor kinase expressed in gonads 

Type II activin receptor homolog 

Larva development regulatory protein; BMP receptor 

Produa of gene required for vulval development 



O-X. Flowering plant putative receptor kinase family 
Phylum Angiospermophyta (Kingdom Plantae): 

1 . ZmPKl : Putative receptor protein-serine kinase (maize) 

2. Srk: "S receptor kinase"; three distina alleles: 2» 6. and 910 (Brassica) 

3. Tmk 1 : Putative "Transmembrane receptor kinase" ( Arabidopsis) 
4] Apkl : Kinase tiiat phosphorylatcs Tyr, Ser, and Thr (Arabidopsis) 

* 5. Nak: "Novel Arabidopsis Kinase" (Arabidopsis) . w j -x 
6. Pro25: Putative kinase selected for specificity to tiiylakoid membrane protem (Arabidopsu) 

* 7 pto: Product of genen conferring pathogen resistance (tomato) 

* 8. TmkI 1 : Transmembrane protein vritii unusual kinase4ike domain (Arabidopsis) 

* 9, Prkl: PoUen«pressed receptor-like putative kinase (Petunia) 

O-XI. Family of "mbccd-lineage" kinases with leucine zipper domain 
vertebrate: 

* 1. Mlkl: "Muted lineage kinase", type 1 

* 2. Mlk2: "Muced lineage kinase", type 2 

* 3. Mlk3: "Muced Uneage kinase", type 3 (PTKl, SPRK) 
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Table 1. (cantintted). 



0*Xn. Casdn kinase I faaufy 
vgrUbmU: 

1. CKlou 

2. CKlp: 

3. CKlr 

4. CK16: 
SaaJiaromyus cenvisiae: 

1. Yckl: 

2. Yck2: 

5. Hn25: 
Schuosaccharonrfces pombe: 

• 1. Hhpl: 

• 2. Hhp2: 



Casein kinase I. type alpha 
Casein kinase I. type beta 
Casein kinase I. type gamma 
Casein kinase I, type delta 

Budding yeast casein kinase I homotog, type 1 
Budding yeast casein kinase I homolog, type 2 
Kinase required for DN A repair 

Fission yeast casein kinase I homolog, type 1 
Fission yeast casein kinase I homolog. type 2 



OXni. PRN femily of prokaryotic protein kinases 

Afytococcus xontftttf (Pf^m MyxobacUria: Kingdom FmkafjoUu): 

1. Fknl: Protein kinase homologous to eukaryotic kinases 

2. Pkn2: Protein kinase required for maintenance of stationary phase cells and devdopment 

Other protein kinase family members (each %vith no known dose relatives) 
vertAnUe: 



1. Mos: 

2. Piml: 
5. Cot 
4. Esk: 

* 5. GC kinase: 

• 6.Slk: 

♦ 7.UMK: 

• 8.Tskl: 
DrosophUa meianogaster 

1. NinaC: 

2. Pelle: 

♦ 3. Nemo: 
DUtfosUimm discokUum: 

1. SplA: 

2. Dpyk2: 
Ceratodon purpureus: (a moss) 

l.PhyCen 
Saccharomyces artvisiae: 

1. Cdc7: 

2. CDC15: 

3. VpslS: 

4. Nprl: 

5. Elml: 

6. Irel: 

7. Ykl516: 

• 8. Ipll: 
Sdtixouutharomyces pombe: 

1. Ranl: 

2. Chkl: 

♦ S.Cskl: 
4. RPKl: 



Cdlular homolog of retroviral oncogene product 
Protooncogene activated by murine leukemia virus 
Produa of oncogene expressed in human thyroid carcinoma 
"Embryonal carcinoma STY kinase'; dual spcdfidty (PIT) 
Kinase expressed in germinal center B cells 
STE204clated kinase 
UM motif-containing kinase** 
"Testis^pedfic kinase" 

Product of gene essential for photoreceptor function 

Product of gene required for dorsahrentral polarity 

Produa of gene required for rotation of photoreceptor dusters 

Spore lysis A protein kinase 
DevdopmentaDy-reguated tyrosine kinase, type 2 

Putative protein-tyrosine kinase encoded by a phytochromc gene 

"Celldivision<yde*' control gene product 

"CellKlivision<ycle' control gene product 

Product of gene essential for sorting to lysosome4ike vacuole 

Product of gene required for activity of ammonia^ensitive amino add permeases 

Product of gene required for yeast-like ceU morphology 

Required for Myoinositol synthesis and signaling from ER to the nucleus 

Putative protein kinase gene on chromosome XI 

Product of gene required for chromosome segregation 



Product of gene required for normal mciotic function 
"Checkpoint Kinase" that links rad pathvraiy to Cdc2 
"Cydin Suppressing Kinase' 
"Regulatory cell proliferation kinase' 
Entamoeba histofytka (Phylum Rhixopoda, Kingdom Ffotoctista): 

1 . Ehm&l: Distant relative of Mos 

Phylum Angiospermophyta (Kingdom Plantae): 

1. GmPK6: Protein kinase homolog (soybean) 

• 2. Tsl: Product of TausUd gene required for normal lea^/flower de%-elopmcnt (Arabidopus) 

Yersinia psuedoiubercuiosis (Phylum OmnibaOeria, Kingdom Prokaryotae): 

1. YpkA: Enterobacterial protein kinase essential for \irulence ^ 



known primary structures. The kinase domains are fur- 
ther divided into 12 smaller subdomains (indicated by 
Roman numerals), defined as regions never interrupted 
by buqge amino add insertions and containing charac- 
terisdc patterns of conserved residues (consensus line in 

rig.1). 

Twelve kinase domain residues are recognized as being 
invariant or nearly invariant throu^out the superfiaunily 
(conserved in over 95% of S70 sequences), and hence 
strong^ implicated as playing essential roles in enzyme 



function. Using the type a cAMP-dependent protein ki- 
nase catalytic subunit (PKA-Ca) as a reference point, 
these are equivalent to Gly50 and Gly52 in subdomain I, 
Lys72 in subdomain 11, Glu91 in subdomain m, Aspl66 
and Asnl71 in subdomain VIB, Aspl84 and Glvl86 in 
subdomain VII, Glu208 in subdomain VIII, Asp220 and 
Gty225 in subdomain DC. and Ar^SO in subdomain XI. 

The patterns of amino add residues found within sub- 
domains VIB, VIII, and IX have been particularly 
well-conserved among the individual members of the dif- 
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Figure 1. Multiple alignments of 60 kinase domains representative of members of the eukaryotic protein kinase super&mily. The 
abbreviated names used are as defined in Table 1. The sin^e letter amino acid code is used and gaps are indicated by dashes. The 
entire sequences for the larger inserts are not shown, but excluded residues are indicated as numbers in brackets. Twehre distinct 
subdomains are indicated by Roman numerals. The consensus line is given according to the foUowing code: uppercase letters* invariant 
residues, lowercase residues nearly invariant residues; o, positions conserving honpolar residues; positions conserving polar 
residues; +. positions conserving small residues with near neutral polarity. Residues corresponding to the numbered ^strands (b) 
and a-helices (a) in PKA-Ca are indicated in the 2 • structure line. 
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ferent protein kinase families and these motifs have been 
targeted most frequently in PCR-based homology clon- 
ing strategies aimed at identifying new famUy members. 

Relationship between conserved subdomains, higher 
order structure, and catalytic mechanism 

The homologous nature of the kinase domains implies 
that they all fold into topologically similar 3-dimensional 
core structures and impart phosphotransfer according to 
a common mechanism. The larger inserts found within 
some kinase domains are likely to represent surface ele- 
ments that do not disrupt the basic core structure. With 
the solution of the crystal structure of mouse PKA-Ca, in 
a binary complex with a pseudosubstrate peptide inhibi- 
tor (PKI 5-24; TTYADFIASGRTGRRNAIHD, die under- 
lined Ala substituting for the Ser phosphoacceptor), the 
general topology of a protein kinase catalytic core struc- 



ture was revealed for the first time (25, 26). Later, struc- 
tures of ternary complexes of PKA-Go, the 
pseudosubstrate inhibitor, and either MgATP or 
MnAMP-PNP (an MgATP analog) were solved (27, 28). 
As a consequence of these studies, precise funcuonal 
roles for most of the highly conserved kinase domain 
residues have now been assigned. 

The kinase domain of PKA-Ca folds into a two-lobed 
structure (F^. 2). The smaUer, NH2- terminal lobe, which 
includes subdomains I-IV, is primarily involved in an- 
choring and orienting the nucleotide. This lobe has a 
predominantly antiparallel p-sheet structure that is 
unique among nucleotide binding protems. The larger 
COOH-terminal lobe, which includes subdomains 
VIA-XI, is largely responsible for binding the peptide 
substrate and initiating phosphouransfer. It is predomi- 
nandy a-helical in content. Subdomain V residues span 
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Figure 1 (contd.). 



the two lobes. The deep cleft between the two lobes is 
recognized as the site of catalysis. The crystal structures 
of four additional eukaryotic protein kinase superfamily 
members-cydm-dependent kinase 2 (Cdk2) (29), p42 
MAP kinase (Erk2) (30), twitchin kinase (31), and casein 
kinase I (32)-have been reported more recendy, and as 
esqpeaed, their kinase domains were found to fold into 
two*lobed structures topologically very similar to the 
catalytic core of PKA-Cot. Notable differences, however, 
were found in the regions corresponding to subdomain 
Vni in the Cdk2 and Erk2 structures, apparendy reflect- 
ing the fact that these are structures of enzymes in an 
inactive state (see below). The twitchin structure is also of 
an inactive enzyme, but in this case it is inactive due; to 
the presence of an autoinhibitory peptide sequence, 
which lies on the COOH-terminal side of the kinase do- 
main and folds back into the active site cleft between the 
two lobes (31). This peptide apparendy forces the two 



lobes to rotate almost SO** with respect to one another, 
and in this configuration inactive twitchin is more similar 
to die open configuration of PKA-Ca without PKI (33). 
In both twitchin and Cdk2 die a-helbc C in subdomain 
III also adopts a different position to that of helfac C in 
PKA-Ca. Unfortunately, no structure of a protein-tyro- 
sine kinase catalytic domain was available at the time of 
writing (see "Note added in proof), but the ease with 
which it hjfcs been possible to model the kinase domain of 
the EGF receptor protein-tyrosinc kinase on to that of 
the PKA-Ca emphasizes that the structure of the pro- 
tein-tyrosine kinases will be similar to that of the pro- 
tein-serine kinases (34) 

The conserved kinase subdomains correspond quite 
well to precise units of higher order structure. The ftinc- 
tions of the individual subdomains will be discussed 
briefly later on a subdomain-by-subdomain basis, mak- 
ing reference to the crystal structure of PKA-Ca and 
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Figure 1 (contd.). 



drawing attention to the proposed roles of the nearly 
invariant amino acid residues (25-27, 28) and other resi- 
dues of interest For more detailed information* the 
reader is referred to recent reviews on the structure of 
PKA-Ca (35-37) and to an excellent comparative review 
of the structures of PKA-Ca, Erk2, and Cdk2 (38). 

Subdomain I, at the NHs terminus of the kinase do- 
main, contains the consensus motif Gly-x-Gly-x-x-Gly- 
x-Val (starting with Gly50 in PKA-Ca). The kinase do- 
main NH2-tenninal boundary occurs seven positions up- 
stream of the first glycine in the consensus, where a 
hydrophobic residue is usually found. Subdomain I resi- 
dues fold into a ^-strand-tum-P-strand structure encom- 

Sassing ^-strands 1 and 2, and this structure acts as a 
exible flap or damp that covers and anchors the non- 
transferable phosphates of ATP. The backbone amides of 
Ser53, Phe54, and Gly55 form hydrogen bonds with ATP 
^ phosphate oxygens. Leu49 and Val57 contribute to a 
hydrophobic pocket that encloses the adenine ring of 
ATP. 



Subdomain II contains the invariant Lys (Lys72 in 
PKA-Ca), which has long been recognized as being essen- 
tial for maximal enzyme activity. This Lys lies within 
strand 3 of the small lobe, and helps anchor and orient 
ATP by interacting with the a- and p- phosphates. In 
addition, Lys72 forms a salt bridge with the carboxyl 
group of the nearly invariant Glu91 in subdomain m. 
Ala70 contributes to the hydrophobic adenine ring 
pocket. In PKA-Ca, ^-strand 3 is followed immediately 
by a-heUx B, which, judging from the sequence align- 
ment, appears to be quite a variable structure among the 
protein kinases. Indeed, this a- helix is absent in the 
Cdk2 and Erk2 crystal structures. 

Subdomain m represents the large a- helix C in the 
small lobe. The nearly invariant Glu residue (Glu91 in 
PKA-Ca) is centrally located in this helix and helps stabi- 
lize the interactions between Lys72 and the a- and P- 
phosphates of ATP. Subdomain IV corresponds to the 
hydrophobic P-strand 4 in the small lobe. This subdo- 
main contains no invariant or nearly invariant residues 
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Figim2. Ribbon diagram ofthe catalytic core of PKAa (residues 
40-300) in a ternary complex with MgATP and pseudosubstrate 
peptide inhibitor (PKI -5-24). Invariant or nearly-invariant resi- 
dues (GlySO. Gly52, Gly55. Lys72, Glu91, Aspl66, Asnl71, 
Aspl84. Glu208, Asp220, and Ax^80) are indicated by dots along 
the ribbon diagram. Side chains are shown for Lys72, Aspl66, 
Asnl71, Aspl84, Glu208, and Ai^^SO. ^-strands and a-helices 
are indicated by flat arrow and helices, respectively, and are 
numbered according to Knighton et al. (26). The small arrow 
incticates the site of phosphotransfer with the Ala in PKI substi- 
tuting for the phosphoacceptor Ser in the true substrate. (Repro- 
duced, with permission, from Taylor et al. (36)). 



and does not appear to be direcdy involved in catalysis or 
substrate recognition. 

Subdomain V links the small and large lobes of the 
catalytic subimit and consists of the very hydrophobic 
p-strand 5 in the small lobe, the small a-helbc D in the 
large lobe, and an extended chain that connects them. 
Three residues in the connecting chain of PKA-Ca, 
Glul21. Vall2S, and Glul27 help anchor ATP by forming 
hydrogen bonds with either the adenine or the nbose 
ring. Metl20. Tyrl22, and Vall23 contribute to the hy- 
drophobic pocket surrounding the adenine ring. Glul27 
also participates in peptide binding by forming an ion 
pair with an Arg in the pseudosubstrate site of the PKA 
inhibitor peptide. This represents the first Arg in the PKA 
substrate recognition consensus Arg-Arg-x-Ser*-Hydro- 
phobic 

Subdomain VIA folds into the large hydrophobic a-he- 
lix E that extends through the large lobe. None of the 
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residues in helix £ appear to interact direcdy with either 
MgATP or peptide substrate; hence this part of the mole- 
cule appears to act mainly as a support structure. Subdo- 
main VIB folds into the small hydrophobic ^-strands 6 
and 7 with ah intervening loop. Included here are two 
invariant residues (Aspl66 and Asnl71 in PKA-Ca) that 
lie within the consensus motif His-Arg-Asp-Leu-Lys- 
x-x-Asn (HRDLKxxN). The loop has been termed the 
catalytic loop because Asp 166 within the loop has 
emerged as the likely candidate for the catalytic base, 
accepting the proton from the attacking substrate hy- 
droxyl group during an in- line phosphotransfer mecha- 
nism. Lysl68 in the loop (substituted by Arg in the 
conventional protein-tyrosine kinases) may help facilitate 
phosphotransfer by neutralizing the negative charge of 
the Y-phosphate during transfer. The side chain of 
Asnl71 helps to stabilize the catalytic loop through hydro- 
gen bonding to the backbone carbonvl of Asp 166 and 
also acts to chelate the secondary Mg^* ion that bridges 
the a- and y-phosphates of the ATP. The carbonyl group 
of Glul70 forms a hydrogen bond with an ATP ribose 
hydroxyl group. Glul70 also participates in substrate 
binding by forming an ion pair with the second arginine 
of the peptide recognition consensus. 

Subdomain VII folds into a p-strand-loop-b-strand 
structure, encompassing ^-strands 8 and 9. The highly 
conserved DFG triplet, corresponding to Aspl84- 
Phel85-Glyl86 in PKA-Ca, lies in the loop that is stabi- 
lized by a hydrogen bond between Aspl84 and Glyl86. 
Aspl84 chelates the primary activating Mg** ions that 
bridge the P- and 7-phosphates of the ATP, and thereby 
helps to orient the y-phosphate for transfer. In Cdk2, 
P-strand 9 is replaced with a small a-helbc designated 
aL12. However, it is unclear whether this helical charac- 
ter is maintained when Cdk2 is in its active conformation. 

Subdomain VIII, which includes the highly conserved 
Ala-Pro-Glu ('APE') motif (residues 206-208 in 
PKA-Ca), folds into a tortuous chain that faces the cleft. 
Residues lying 7-10 positions immediately upstream of 
the APE motif are characteristically well-conserved 
among the members of different protein kinase families. 
The nearly invariant Glu corresponding to PKA-Ca 
Glu208 forms an ion pair with an invariant Arg (Arg280 
in PKA-Ca) in subdomain XI, thereby helping to subilize 
the large lobe. 

Subdomain VIII appears to play a major role in recog- 
nition of peptide substrates. Several PKA-Ca subdomain 
VIII residues participate in binding the pseudosubstrate 
inhibitor peptide. LeuI98, Cysl99. Pro202, and Leu205 
of PKA-Ca provide a hydrophobic pocket that accommo- 
dates the side chain of the hydrophobic residue at posi- 
tion +1 of the substrate consensus (He for the inhibitor 
peptide). Gly200 forms a hydrogen bond with the same 
He residue. Glu203 forms two ion pairs with the Arg in 
the high-affinity binding region of the inhibitor peptide. 

Many protein kinases are known to be activated by 
phosphorylation of residues in subdomain VIII. In 
PKA-Ca, maximal kinase activity requires phosphoryla- 
tion of Thrl97, probably occurring through an intermo- 
lecular autophosphorylation mechanism (39). In the 
crystal structure, phosphate oicygens of phospho-Thrl97 
form hydrogen bonds with the charged side chains of 
Argl65, LysI89, and the hydroxyl group of Thrl95, and 
thereby may act to stabilize the subdomain VIII loop in 
an active conformation permitdng proper orientauon of 
the substrate peptide. For members of the Erk (MAP) 
kinase family, phosphorylation of both a Thr and a Tyr 
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residue in subdomain VIII (mediated by members of the 
MEK kinase family) is required for activation. In the crys- 
tal structure determined for Erk2, these residues (Thrl83 
and Tyrl85) were not phosphorylated and thus the en- 
zyme was in an inactive state (unlike the PKA-Ca struc- 
ture). The unphosphorylated TyrI85 is buried in a 
hydrophobic pocket, and interactions with Tyrl85 are 
apparendy required to hold the enzyme in the inactive 
state. Mutation of Tyrl85, however, does not activate the 
enzyme, and so phosphorylation of Tyrl85 must also play 
a role in activation. Unphosphorylated Erk2 appears to be 
inactive because residues required for catalysis are not 
properly oriented, and because its conformation results 
in a partial steric block to substrate binding. During acti- 
vation of £rk2, Tyrl85 phosphorylation precedes 1^183 
phosphorylation; therefore, binding of MEK to Erk2 may 
alter the conformation of the subdomain VIII loop, 
thereby exposing Tyrl85 for phosphorylation by MEK. 
Interaction of phospho-Tyrl85 with surface residues 
would then allow the subdomain VIII loop to adopt the 
active conformation (50). Subsequent phosphorylation of 
the exposed Thrl83 may activate the enzyme fiilly by 
promoting correct alignment of the catalytic residues. 
From the crystal structure of Cdk2, likewise in an inactive 
unphosphorylated state, the subdomain VIII loop appears 
to be in a conformation that would inhibit enzyme activity 
by sterically blocking the presumed protein substrate 
binding cleft (29). Phosphorylation of Thrl60 in the Cdk2 
subdomain VIII, mediated by MOI5 (CAK), presumably 
would act to remove this inhibition by stabilizing the loop 
in an active conformation similar to that found in 
PKA-Ca. Cydin binding to the NH2-tenninal lobe is also 
needed to activate Cdk2, and this may cause rotation of 
the NH2-terminal domain resulting in correct alignment 
of catalytic residues. 

Subdomain IX corresponds to the large a- helix F of 
the large lobe. The nearly invariant Asp corresponding to 
PKA-Ca Asp220 lies in the NH2-terminal region of this 
helix and acts to stabilize the catalytic loop by hydrogen 
bonding to the backbone amides of Argl65 and TyrI64 
that precede the loop. Glu230 of PKA-Ca forms an ion 
pair with the second Arg of the peptide recognition con- 
sensus. PKA-Ca residues 235-239 are all involved in hy- 
drophobic interactions with the inhibitor peptide. 

Subdomain X is the most poorly conserved subdomain 
and its function is obscure. In die crystal structure of 
PKA-Ca, it corresponds to the small a-helbc G that occu- 
pies the base of the large lobe. Members of the Cdk, Erk 
(MAP), GSK3, and Clk kinase families {the C-M-G-C 
group) all have rather large insertions between subdo- 
mains X and XI. whose functional significance is presentiy 
unclear. Subdomain XI extends to the COOH-terminal 
end of the kinase domain. The most notable feature here 
is the nearly invariant Arg corresponding to Arg280 in 
PKA-Ca, which lies between a-helices H and I. The 
COOH-terminal boundary of the kinase domain is still 
poorly defined. For many protein-serine kinases, the con- 
sensus motif His-x-Aromatic-Hydrophobic is found be- 
ginning 9-13 residues downstream of the invariant Arg. 
For protein-tyrosine kinases, a hydrophobic amino acid 
lying 10 positions downstream of the invariant Arg ap- 
pears to define the COOH-terminal boundary. 

The amphipathic a-helix A of PKA-Ca (residues 
15-35; not shown in Fig. 2), though lying outside of the 
conserved catalytic core on the NH2-terminal side, ap 
pears to be an important feature found in many protein 



kinases (40). This helix spans the surface of both lobes of 
the core structure and complements and stabilizes the 
hydrophobic deft between the two lobes. The A-helix 
motif appears to be present in many other protein kinases 
including members of the protein kinase C family and the 
Src family of protein-tyrosine kinases (40). 

CLASSinCATION OF EUKARYOTIC PROTEIN 
KINASES 

To facilitate analysis and management of this large super- 
family we have devised the classification scheme shown in 
Table 1, which subdivides the known members of the 
eukaryotic protein kinase superfamily into distinct fami- 
lies that share basic structural and fimctional properties. 
Phylogenetic trees derived from an alignment of kinase 
domain amino acid sequences (essentially an emanded 
version of Fig. 1) served as the basis for this classification. 
Thus, the sole consideration was similarity in kinase do- 
main amino add sequence. When considered alone, how- 
ever, this property has been a good indicator of other 
characteristics held in common by the different members 
of the family. 

Protein kinases whose entire kinase domain amino add 
sequence had been published by July 1993 were induded 
in phylogenetic analysis (as well as a few others made 
available at that time through sequence databases). If a 
given kinase domain sequence had been determined firom 
more than one spedes among the vertebrates (i.e., or- 
thologous gene products), only one representative (usu- 
ally human) was induded in the analysis. This policy was 
not used for the other ph)4a, however, because of greater 
divergences between the species and, hence, the se- 
quences. The kinase domain phytogenies were inferred 
using the principle of maximum parsimony according to 
the PAUP software package developed by Swofford (41). 
Minimum-length trees were found using PAUFs "heuris- 
tic' search method with branch swapping by the 'tree 
bisection-reconnection' strategy. Equal weights were 
given for aU amino acid substitutions. Because multiple 
minimum-length trees were found, a consensus tree was 
calculated according to the method of Adams (dted in ref 
41) in order to show branching ambiguities. 

To accommodate the large niunbers of sequences, it 
was necessary to construct five separate trees. Initially, a 
skeleton tree of 99 kinases was obtained (Rg. SA). The 
skeleton tree included only representative members firom 
each of four large groups of protein kinases, each consist- 
ing of multiple related families known firom previous 
work to duster together in the tree. These four groups 
are designated: 1) the AGC group, which indudes the 
cyclic-nudeotide-dependent family (PKA and PK£e), the 
protein kinase C (PKC) family, the P-adrenergic receptor 
kinase (pARK) family, die ribosomal S6 kinase family, and 
odier dose relatives; 2) the CaMK group, which indudes 
the famUy of protein kinases regidated by caldunv/cat 
modulin, the Snfl/AMPK family, and other dose rdar 
tives; 3) the CMGC group, which indudes the family of 
cyclin-dependent kinases, die Erk (MAP) kinase family, 
die glycogen syndiase 3 (GSK3) family, the casein kinase 
II family, die Ok (Cdk-like kinase) family, and odier dose 
relatives; and 4) the 'conventional' protein-tyrosine ki- 
nase (PTK) group. Separate trees (Fig. SB-E) were later 
obtained for each of the four large kinase groups, and 
contain all members of the groups whose sequences were 
available at the time of analysis. 
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Figure S. Phylogenedc trees of the eukaryotic protein kinase 
superfamily inferred from kinase domain amino add sequence 
alignments. The abbreviated nomenclature is the same used in 
Table 1. A) *Skdcton* tree showing 99 protein kinases. Positions 
of 4 clusters (AGC, CaMK. CMGC, and PTK) containing protein 
kinases representative of larger groups are indicated in the skele- 
ton tree. B) AGC group tree of 59 protein kinases including PKA, 
PKG, and PKC and other dose relatives. C) CaMK group tree of 
35 protein kinases induding the caldunv^cahnodulin*regulated 
enzymes. D) CMGC group tree of 59 protein kinases inrl^irfjng 
the cydin-dependent kinases. E) PTK group tree of 90 conven- 
tional protein-tyrosine kinases. Tree A is unrooted and drawn 
with Pknl and Pkn2 as outgroups. Outgroups of two or more 
distantly related protein kinases (not shown) were induded in the 
analysis of trees B-E to provide a rooting point. Asterisks (*) in 
all trees indicate branches leading to defined protein kinase 
families listed in Table 1. Branch lengths indicate number of 
amino add substitutions required to reach hypothetical common 
ancestors at internal nodes. 



It can be reasonably surmised that the protein kinases 
having closely related catalytic domains, and thus defining 
a family, represent products of genes that have under- 
gone relatively recent evolutionary separations. Given 
this, it should come as no surprise that members of a 
given family tend also to share related functions. This is 
manifest by similarities in overall structural topology, 
mode of r^;ulation, and substrate specifidty. The details 
of the common properties exhibited by the members of 
the various kinase families can best be gleaned from 
studying the information oudined in the individual en- 
tries section of the Protein Kinase Factsbook (42). Some of 
the most salient relationships are discussed below. 

The AGC group protein kinases tend to be basic amino 
acid-directed enzymes, phosphorylating substrates at 
Scr/Thr residues lying very near Arg and Lys. For the 
cydic nucleotide-dependent and ribosomal S6 kinase 
families, the preferred substrates have basic residues lying 
in specific positions NHg-terminal to the phosphate ac- 
ceptor. Preferred substrates for the PKC and RAC fami- 
lies have basic residues on both the NH2- and COOH- 
terminal sides of the acceptor (43). The G-protein-cou- 
pled receptor kinases (pARK and RhK) appear to break 
this rule, however, as they are reported to prefer synthetic 
peptide subsurate residues located within an acidic envi- 
ronment. Utde substrate information is available for the 
other fomilies in this group. 



The CaMK group protein kinases also tend to be basic 
amino add- directed, and in this regard it is notable that 
the AGC and CaMK groups fall near one another in the 
phylogenetic tree. CaMKl, CaMK2, CaMK4. MLCK. 
CDPK, and AMPK are all reported to prefer substrates 
with basic residues at specific positions NH2-terminal to 
the acceptor site, whereas EF2K and PhK prefer sites with 
basic residues at both NH2- and COOH-terminal loca- 
tions. Many, but not all, of the CaMK eroup protein 
kinases are known to be activated by Csry calmodulin 
binding to a small domain located just COOH-terminal 
to the catalytic domain, e.g., CaMKl, CaMK2, CaMK4, 
PhKy, MLCK, and twitchin. These enzymes and their 
dose relatives are grouped together in a large fiunily 
within the CaMK group. Also included in this £amily are 
a subfamily of plant enzymes (represented by CDPK) that 
contain an intrinsic calmodulin-like domain that confers 
Ca^-dependent activation. The other family within the 
CaMK group is the Snfl/AMPK family. Within this fam- 
ily, substrate specificity determinant information has 
been obtained only for the AMP-actiyated protein kinase, 
which also shows a requirement for an NH2-ternunal 
basic residue. The other major category of protein-serine 
kinases is the CMGC group. For the most part, these are 
proline-directed enzymes, phosphorylating substrates at 
sites lying in Pro-rich environments. Available data for 
Ckic2 and Cdk2 indicate that members of the cydin-de- 
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pendent kinase fomily require phosphate acceptors lying 
immediately NHs-terminal to a Pro. A similar require- 
ment is indicated for the Erk (NfAP) kinase fomfly. The 
situation for the GSK3 family is more complicated, but 
most known acceptor sites lie within Pro-rich regions. 
The structures of Cdk2 and Erk2 indicate that the pocket 
for the -1-1 residue is shallower than in PKA-Ca due to the 
replacement of Leu205 by an Arg, which is bulkier and 

Eredudes binding of the larger hychrophobic amino adds. 
1 addition, the imique secondary amide group of Pro 
may make spedal interactions (44). The casein-kinase 11 
family enzymes fail to conform to the proline-directed 
spedfidty exhibited by the other major families of this 
group, showing instead a strong preference for Ser resi- 
dues located NH2-terminal to a duster of addic residues. 
The CMGC group protein kinases have larger-than-aver- 
age kinase domains due to insertions between subdo- 
mains X and XI, whose functional significance is 
unknown. 

The conventional protein-trosine kinase group in- 
dudes a large number of enzymes with quite dosely re- 
lated kinase domains that specifically phosphorylate on 
Tyr residues (i.e., they cannot phosphorylate Ser or Thr). 
These enzymes, first recognized among retroviral onco- 
proteins, have been found only in metazoan cells where 
they are widely recognized for their roles in transducing 
growth and differentiation signals. Induded in this group 
are more than a dozen distinct receptor families made up 
of membrane-spanning molecules diat share similar over- 
all structural topologies, and nine nonreceptor families 
also composed of structuraUy similar molecules. The 
spedfidty determinants surroimding the Tyr phosphoac- 
ceptor sites have yet to be firmly established for these 
enzymes, but Glu residues either on the NH2- or COOH- 
terminal side of the acceptor are often preferred. This 
group is labeled "conventional" to distinguish it from 
other protein kinases (induding Spkl, Ok, the MEK/Ste7 
family members. Weel/Mikl, ActRII, Hrr25, Esk, and 
SplA/DPyk2) reported to exhibit a dual spedfidty, that 
is, being capable of phosphorylating both Tyr and 
Ser/Thr residues (45). However, in most cases dual sped- 
fidty has been observed only for autophosphorylation 
reactions in vitro, and the only dual spNedfidty protein 
kinases that are known to be able to phosphorylate a 
substrate on Ser/Thr and Tyr are members of the MEK 
family. Considered as a group, these dual-specificity pro- 
tein kinases are not particularly dosely related to the 
conventional PTKs. Indeed, they seem to map through- 
out the ph)dogenetic tree (45), suggesting that the ability 
to autophosphoiylate on Tyr may have had many inde- 
pendent origins during the evolutionary history of the 
superfamily. 

The protein kinases falling outside the four major 
ffroups are a mixed bag. Although the individual mem- 
bers within the defined families found in this "other" 
category dearly are related to one another through both 
structure and function, it is difficult to make broader 
generalizations that could ^roup anv of these families 
toother into a lar^ger category. As far as substrate sped- 
fiaty determinants go, littie is known about most "otiier" 
category protein kmases, due primarily to their rather 
recent discovery and the paudiy of . known physiological 
substrates. The casein kinase I family members, however, 
have been shown to prefer Ser/Thr residues located 
COOH-terminal to a phosphoserine or phosphothreon- 
ine, although a stretch of addic residues may substitute. 



Also, the family of protein kiruuses involved in transla- 
tional control (HRI, PKS/Tik, Gcn2) appear to be basic 
amino acid-directed enzymes preferring Ser residues ly* 
ing NH2- terminal to an Arg. Finally, as mentioned pre- 
viously, the MEK/Ste7 family protein kinases and 
Weel/Mikl protein kinases exhibit a dual spedfidty. 

Although this classification is based solely on catalytic 
domain sequences, members of families defined by this 
means are usually dosely related in regions lying outside 
the cataytic. domains and in many cases have beeifi shown 
to possess very similar functions. Thus, intercalation of 
newly discovered protein kinases into this classification 
should allow one to make useful predictions about the 
functions of such enzymes. 



FUTURE PROSPECTS 

The rate of protein kinase discovery still shows no signs 
of abating. In addition to the continuing successes of 
homology-based approaches, genomic sequencing pro- 
jects are begirming to make significant contributions. For 
instance, the sequences of two entire budding yeast chro- 
mosomes (46. 47) and a "2 Mb stretch of C elegans chro- 
mosome in (48) have revealed a number of new putative 
protein kinase genes. As genome sequencing projects 
gather speed, the number of new protein kiruise genes 
discovered in this way will imdoubtedly mushroom. This 
explosion of sequence data is making it increasingly difiB- 
cult to manage protein kinase databases of the sort de- 
scribed here. Programs designed to align and derive 
relatedness trees are currently imable to handle the large 
number of available kinase domain sequences. New data 
handling programs will have to be developed to cope with 
large numbers of sequences like those of the eukaryotic 
protein kinase superfamily. 

Protein kinase catalytic domain structures wiU continue 
to be solved. The first structure of a conventional pro- 
tein-tyrosine kinase will be available shordy (see "Note 
added in proof), and this should reveal how Tyr is se- 
lected as an acceptor amino add vs. Ser/Thr. Such struc- 
tures will enable comparative analysis to be carried out at 
the 3-dimensional level, and allow predictions of struc- 
tures from primary sequences. Structural comparisons of 
catalytic domains with bound peptide substrates will also 
provide insights into substrate specificity. Most protein 
kinases show some degree of primary sequence specific- 
ity, and new methods are being developed to determine 
consensus sequence sf>ecifidties for individual protein kir 
nases (44). With such consensus information the struc- 
tural basis for the binding of a preferred peptide 
sequence to the coenate substrate binding site can then 
be deduced. In the fiiture, it may be possible to model the 
3-dimensional structure of a novel protein kinase cata- 
lytic domain with sufficient accuracy to be able to deduce 
the preferred primary sequence surrounding the hy- 
droxyamino add it phosphorylates, which in turn wfll 
allow one to predict what proteins might be its substrates 
from the increasingly complete database of protein se- 
quences. ED 



Note added in proof: The crystal structure of the tyrosine kinase 
domain of the insulin receptor has now appeared (Hubbard, 
S. R., Wei. L.. Ellis, L.. and Hendrickson, W. A. (1994) Natun S72, 
746-754). 
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The reversible phosphorylation of proteins on serine, thre- 
onine, and tyrosine residues represents a fundamental 
strategy used by eukaryotic organisms to regulate a host of 
biological functions, including DNA replication, cell cycle 
progression, energy metabolism, and cell growth and dif- 
ferentiation. Levels of cellular protein phosphorylation 
are modulated both by protein kinases and phosphatases. 
Although the importance of kinases in this process has 
long been recognized, an appreciation for the complex and 
fundamental role of phosphatases is more recent. Through 
extensive biochemical and genetic analysis, we now know 
that pathways are not simply switched on with kinases and 
off with phosphatases. Rather, it is the balance of phos- 
phorylation that is often critical. Protein phosphorylation 
can regulate enzyme function, mediate protein-protein in- 
teractions, alter subcellular localization, and control pro- 
tein stability. Furthermore, kinases and phosphatases may 
work together to modulate the strength of a signal. Adding 
further complexity to this picture is the fact that both ki- 
nases and phosphatases can function in signaling networks 
where multiple kinases and phosphatases contribute to the 
outcome of a pathway. To fully understand this complex 
and essential regulatory process, the kinases and phos- 
phatases mediating the changes in cellular phosphoryla- 
tion must be identified and characterized. 

A variety of approaches, including biochemical purifica- 
tion, gene isolation by homology, and genetic screens, 
have been successfully used for the identification of puta- 
tive protein kinases and phosphatases. Now, the genomic 
sequencing of organisms promises to be a major contribu- 
tor to this field. Valuable insight into these important en- 
zymes has already emerged from the analysis of the yeast 
and worm genomes. In particular, genomic sequencing of 
Saccharomyces cerevisiae and Caenorhabditis elegans has 
revealed the kinase and phosphatase gene families that 
have arisen during the evolution of multicellular eukary- 
otes (Plowman et al., 1999). With the recent determination 
of the Drosophila sequence, we can now survey the ge- 
nome of a second multicellular eukaryote for its repertoire 
of kinases and phosphatases. In this review, we will 
present our findings on the protein kinase and phos- 
phatase gene families identified in the fly, together with an 
examination of the kinase/phosphatase signaling pathways 
functioning in flies, worms, and humans. 
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Identification and Classification of Drosophila 
Protein Kinases and Phosphatases 

Our survey of Drosophila protein kinases and phospha- 
tases is based on the total set of predicted proteins that 
were identified in the Drosophila genome using auto- 
mated gene predictor methods (Adams et al., 2000; avail- 
able at http://www.celera.com). The 13,601 predicted fly 
proteins were surveyed for overall homology with known 
kinase and phosphatase sequences using BLASTP, and for 
the presence of polypeptide motifs using BLOCKS and In- 
terPro databases (Rubin et al., 2000). Putative kinases and 
phosphatases identified by these means were further clas- 
sified based on the presence of diagnostic amino acid resi- 
dues in conserved motifs and by sequence similarities 
extending beyond conserved catalytic domains. Table I 
summarizes our survey of the Drosophila protein kinases 
and phosphatases. It is important to realize that this analy- 
sis represents the first tabulation of these enzymes in 
Drosophila and will be subject to revision as gaps in the 
genomic sequence are closed and methods for predicting 
and analyzing genes are improved. In particular, it is 
known that the Genie and Genscan programs used to an- 
notate the fly genomic sequence make systematic errors 
with respect to intron-exon boundaries and gene borders, 
leading us to conclude that some kinase and phosphatase 
proteins may have been missed by these programs (Reese 
et al, 2000). These caveats notwithstanding, 251 kinases 
and 86 phosphatases were identified by our analysis of the 
predicted Drosophila protein set. Remarkably, more than 
half of these molecules had gone undetected in eight de- 
cades of Drosophila research. 

Protein Kinases 

Eukaryotic protein kinases are enzymes that catalyze the 
transfer of phosphate from ATP or GTP onto serine, thre- 
onine, or tyrosine residues of their appropriate substrates. 
They comprise a single protein superfamily having a com- 
mon catalytic structure. However, these enzymes can be 
subdivided into distinct groups based on their structural 
and functional properties (Hanks and Hunter, 1995). 

AGC Family 

The AGC serine/threonine kinases function in many intra- 
cellular signaling pathways and were first classified based 
on their tendency to phosphorylate sites surrounded by 
basic amino acids. Drosophila contains ^30 AGC kinases, 
including members of the cyclic nucleotide-dependent ki- 
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Table L Summary of Protein Kinases and Phosphatases in 
Flies, Worms, and Humans 



Group 


Fly 


Worm* 


Humans' 


Protein kinase 








AGC 


30 (8) 


30 


100 


CaMK 


25(13) 


32 


83 


CKI 


8(6) 


87 


5 


CMGC 


24 (7) 


42 


62 


STE 


21 (12) 


28 


63 


PTK 


32 (8) 


92 


100 


OPK 


56 (28) 


62 


163 


Atypical 


3(2) 


4 


11 


Fragment/unknown 


18 






Protein kinase like 








Gcyc 


11 (6) 


26 


8 


PIK 


13(8) 


12 


20 


DAG 


8(5) 


7 


8 


Choline K 


2(1) 


7 


2 


Phosphatase 








STP 


28(14) 


65 


21 


RPTP, CPTP, LMW-PTP 


20(12) 


83 


47 


DSP 


18(11) 


26 


51 


IPP 


20(18) 


11 


7 



Fly numbers in parentheses represent the proteins newly identified by the fly genome 
project. 

♦These numbers are taken from the review by Plowman et al. (1999). 



nases, protein kinase C {VKC)} AKT. NDR. MNK. 
MAST, ribosomal S6 kinase, and G protein-coupled re- 
ceptor kinase families. The majority of the fly AGC ki- 
nases had been identified previously by molecular and ge- 
netic analysis; however, eight members were uncovered in 
the fly genome project. Interestingly, four of the new 
genes encode PKC or PKC-related proteins, including the 
first atypical PKC isoforms identified in Drosophila. Also 
identified by the fly genome project were additional PKA 
and PKG proteins, as well as kinases related to mamma- 
lian MAST205 and Citron. 

CaMK Family 

The CaMK serine/threonine kinases also tend to have sub- 
strate recognition motifs containing basic amino acids, 
and some but not all members of this family are regulated 
by calcium or calmodulin. Approximately 25 CaMKs are 
present in Drosophila, including representatives of the cal- 
cium/calmodulin-regulated kinase, SNFl/AMP-dependent 
kinase. EMK, CHK2, myosin light chain kinase (MLCK). 
phosphorylase kinase, death-associated protein kinase, 
and MAPKAP kinase families (the last four of which 
are found in C. eJegans but not yeast). Like worms, flies 
do not encode a complete ortholog of the mammalian 
Trio kinase, but do have a protein that is related to the en- 
tire Trio regulatory domain. CaMK members revealed by 
the fly genome project include proteins related to calcium/ 
calmodulin-regulated kinases, MLCK, EMK. and mam- 
malian DRAKL Of the 13 newly identified CaMKs. 6 be- 



^ Abbreviations used in this paper: CDK, cyclln-dependent kinase; CKI, 
casein kinase I; CTK. cytoplasmic tyrosine kinase; DSP. dual specificity 
phosphatase; LMW. low molecular weight; MKP. MAPK phosphatase: 
PKC. protein kinase C; PTP, protein tyrosine phosphatase; RTK. receptor 
tyrosine kinase; STP. serine/threonine protein phosphatase. 



long to the EMK family, making this the largest CaMK 
group in flies. Mammalian and C. elegans EMK proteins 
have been implicated in the regulation of cell polarity and 
microtubule stability (Drewes et al., 1998). 

Casein Kinase I Family 

The casein kinase I (CKI) proteins originally were charac- 
terized as ubiquitous serine/threonine kinases with a pref- 
erence for acidic substrates such as casein. Although mem- 
bers of this family were among the first kinases purified, 
elucidating their function and regulation has been difficult. 
Recently, however, CKI isoforms have been found to play 
a role in DNA repair and cell division (Gross and Ander- 
son, 1998), in the Wnt signaling pathway (Peters et al., 

1999) , and in circadian rhythm regulation (Lowrey et al., 

2000) . Drosophila contains at least eight CKI proteins, 
only two of which were known previously. Intriguingly, 
CKI is one of the kinase families that is significantly ex- 
panded in the worm, with 87 members identified in C. eJe- 
gans (Plowman et al., 1999). The biological significance of 
the worm-specific expansion is currently unknown. 

CMGC Family 

CMGC family members are primarily proline-directed 
serine/threonine kinases. The major subfamilies of this 
group play key roles in cell cycle regulation and intracellu- 
lar signal transduction, and, not surprisingly, are con- 
served from yeast to humans. Approximately 24 CMGC 
kinases are found in Drosophila, including members of the 
cyclin-dependent kinase (CDK), CDC-like kinase (CLK), 
glycogen synthase kinase 3 (GSK3). and MAPK families. 
Although extensive genetic analysis had revealed many of 
the Drosophila CMGC kinases, seven novel proteins were 
uncovered by the fly genome project. These include addi- 
tional CDK (CDK7-like, CDC2-related KKIALRE. CHED- 
related), GSK3, and MAPK (ERK7) members, as well as 
an RCK family member (M AK) . Also uncovered in the fly 
genome were proteins related to the MPl and JIP-1 scaf- 
folding proteins. These molecules function to localize 
MAPK proteins with their upstream activators and pro- 
vide signaling specificity (Whitmarsh and Davis. 1998). Al- 
though MAPK scaffolding proteins are present in yeast, 
they are structurally different from the ones found in flies, 
worms, and mammals, perhaps indicating the evolution of 
these molecules in multicellular eukaryotes. 

STE Family 

The STE family is composed of the STE7 (MEK). STEll 
(MEKK), and STE20 (MEKKK) kinases that function up- 
stream of MAPK proteins. Drosophila contains ^^21 mem- 
bers of this family, only 9 of which were known previously. 
Remarkably, 9 members of the PAK/STE20 group were 
uncovered by the fly genome project, including proteins 
related to mammalian PAK3, GLKl. NIK. MST2. STLK3, 
TAOl, and CDC7. Although PAK proteins containing PH 
domains are found in yeast (Sells et al., 1999), no PH- 
domain-containing PAKs have been identified in higher eu- 
karyotes and none are present in Drosophila. MEKK- and 
NEK-related kinases were also revealed by the genome 
project. It is worth noting that even with the discovery of 
additional MEK and MAPK proteins in the fly. C. elegans 
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contains over twice as many of these kinases, suggesting an 
expansion of MAPK signaling modules in the worm. 

PTK Family 

The PTK group consists of receptor (RTK) and cytoplas- 
mic (CTK) tyrosine kinases. Although yeasts contain no 
conventional PTKs, 92 have been identified in the worm 
and ^32 are present in the fly. A major function of PTKs 
is in intercellular communication, perhaps explaining why 
these enzymes have only been identified in multicellular 
eukaryotes. In comparison to Drosophila, the much larger 
number of PTKs found in C. elegans is due primarily to ex- 
pansions of the worm-specific Kin- 15/ 16 RTK and PER 
CTK families. The majority of the fly PTKs had been iden- 
tified previously by genetic approaches, reflecting the in- 
volvement of these proteins in critical growth and devel- 
opmental pathways. RTKs encoded in the fly genome 
include the fly-specific Torso and Sevenless kinases, as 
well as kinases related in sequence if not function to the 
mammalian EGFR, FGFR, insulin receptor, EPH, RET, 
ROR, RYK, ALK, and TRK kinases. Of the five newly 
identified RTKs, two are related to mammalian PDGFR/ 
VEGFR, two are DDR receptors, and one shares homol- 
ogy with FGFRl. In the CTK group, fly members include 
the JAK, FAK, SYK/SHARK. ACK, ABL, and FPS ki- 
nases. Of the newly identified CTKs, one is related to 
mammalian ACK2 and one is an ortholog of CSK, a ki- 
nase that negatively regulates the activity of mammalian 
SRC kinases. Interestingly, several members of the PTK 
class are not found in worms, including representatives of 
the SYK, JAK. TRK, and RET families. 

OPK Group 

This group is comprised of other protein kinase (OPK) 
families that do not belong to the six major groups de- 
scribed above. It is the largest class of kinases found in 
flies and consists of both serine/threonine and dual speci- 
ficity kinases. Approximately 56 of these enzymes are 
present in the fly genome, only half of which were known 
previously. Representatives of this group are extremely di- 
verse and include members of the following families: Au- 
rora, BUBl. CHKl. DYRK. WEE-1. PLK. EIF2, TGFp, 
and activin receptor, TAK, IKK kinases, CKII, and RAF 
kinase. Notable in the novel group are additional BUBl 
and TAK members and enzymes related to C. elegans 
UNC 51 and mammalian ALK3, DLK, GAK, MLK2. 
SRPK, IRE. ILK, TLKl, LIM-domain kinase, and LKBl/ 
Peutz-Jeghers kinase. 

Atypical Lipid, and Unlaiown Kinases 

Several protein groups that are structurally related to the 
eukaryotic protein kinases are also found in the Drosoph- 
ila genome. These include the atypical kinases, guanylyl 
cyclases, and the eukaryotic lipid kinases. Flies contains at 
least three atypical kinase members, pyruvate dehydroge- 
nase kinase. A6, and a newly identified BCR protein. Al- 
though worms lack BCR, they do contain a protein related 
to the atypical Dictyostelium myosin heavy chain kinase, 
which appears to be missing in flies. Also absent in both 
Drosophila and C. elegans are representatives of the classi- 
cal prokaryotic histidine kinases. In the lipid kinase group. 



Drosophila encodes at least 8 diacylglycerol kinases, 2 
choline/ethanolamine kinases, and 13 phophatidylinositol 
kinases {PI3-, PI4-, PIP5,- and PIP3-related kinases), the 
majority of which were unknown previously. In mamma- 
lian cells, members of the PIP3-related kinase family par- 
ticipate in the cellular response to DNA damage and have 
authentic protein kinase activity (for review see Fruman 
et al., 1998). The fly genome project has revealed three ki- 
nases of this group, namely ATM, FRAP-related protein 
(FRP), and FRAP/TOR; however, as is true for worms, 
flies do not contain a DNA-PK. Finally. ^18 proteins were 
identified that represent either partial kinase fragments or 
kinases with no significant homology to the groups listed 
above. Since errors have been identified in the transcript 
annotation of several protein kinases, such as the DDR re- 
ceptors, Citron, and a PKC isoform, some of the partial ki- 
nase sequences may represent intact enzymes that have 
been improperly annotated. Further analysis will be re- 
quired to confirm their identity. 

Protein Phosphatases 

Unlike protein kinases, which share a common catalytic 
structure, protein phosphatases have different basic struc- 
tures, use distinct catalytic mechanisms, and comprise at 
least three separate protein families. Phosphatases are typ- 
ically classified into two main groups, the serine/threonine 
protein phosphatases (STPs) and protein tyrosine phos- 
phatases (PTPs). 

STPs 

STPs can be subdivided into the PPP and PPM families 
based on distinct amino acid sequences and crystal struc- 
tures (for review see Cohen. 1997). Both families are 
widely distributed across phyla with representatives found 
in yeast, flies, worms, and mammals. Before the Drosoph- 
ila sequencing project, almost all known fly STPs had been 
identified by molecular cloning approaches. Very few 
STPs have been isolated by genetic analysis, indicating 
that shared substrate specificity and/or functional redun- 
dancy may have prevented the recovery of such mutants. 
Drosophila contains ~28 STPs, whereas >65 are encoded 
in the C. elegans genome. The increased number of worm 
STPs appears to be due to an expansion of the PPP family. 
Members of the PPP family, such as PPl. PP2A, and 
PP2B, have been implicated in numerous biological pro- 
cesses and signal transduction pathways. The diverse func- 
tions of this family are accomplished by a relatively small 
number of highly conserved catalytic subunits that com- 
plex with a wide variety of regulatory proteins, thus tar- 
geting the enzyme to specific intracellular locations and 
substrates. The Drosophila genome encodes ^^17 PPP cat- 
alytic proteins, 8 PPl-related enzymes (including PPls, 
PPN, and PPY), 4 PP2A members (including PP2A, PP4, 
and PPV), 3 PP2B-like molecules, and 2 PP5 proteins. Ad- 
ditional PPP catalytic subunits uncovered by the fly ge- 
nome project include members of the PPl. PP4. and PP2B 
groups. In regard to PPP regulatory subunits, Drosophila 
contains at least 3 PPl. 5 PP2A, and 2 PP2B proteins. 
However, because the regulatory subunits are so diverse, 
these numbers are likely to be low. 
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The PPM family includes PP2C and mitochondrial pyru- 
vate dehydrogenase phosphatase. Due to their highly di- 
vergent primary sequences, few PPM members have been 
isolated by homology-based methods and none have been 
identified by genetic analysis. The only Drosophlla PP2C 
protein that had been previously known was identified by 
genomic walking (Dick et al., 1997). Remarkably, the ge- 
nome project has uncovered at least 1 1 new PP2C-related 
sequences, including one that closely resembles pyruvate 
dehydrogenase phosphatase. The biological function of the 
PPM family has been difficult to assess in mammalian cells 
due to the lack of specific inhibitors that target these en- 
zymes. Recently, however, a PP2C protein has been found 
to dephosphorylate CDC2 on Thrl61 in yeast (Cheng et al„ 
1999). Whether any of the PP2Cs perform a similar func- 
tion in Drosophlla waits to be determined. 

PTPs 

PTPs are found in all eukaryotic organisms, and are de- 
fined by the catalytic signature motif Cys-X5-Arg (for re- 
view see Neel and Tonks, 1997). The PTP superfamily 
consists of classical PTPs (RPTP, CPTP), dual specificity 
phosphatases (DSPs), and low molecular weight (LMW) 
PTPs. Approximately 38 PTPs are encoded in the fly ge- 
nome, including representatives of each class. Again, 
many more PTPs are found in the worm (109 total). It is 
interesting to note that the expansion of serine/threonine 
and tyrosine kinase families in worms has been accompa- 
nied by a corresponding expansion of both serine/threo- 
nine and tyrosine phosphatases. 

Members of the classical PTP family contain a con- 
served catalytic domain that is often fused to a large non- 
catalytic region. The PTP noncatalytic domains are quite 
diverse and can function to regulate enzyme activity and/ 
or mediate protein interactions. Like PTKs, classical PTPs 
can be divided into two groups, receptor PTPs (RPTPs) 
and cytoplasmic PTPs (CPTPs). Genetic studies in Dro- 
sophlla have been instrumental to our understanding of both 
groups. In particular, experiments in the fly were among 
the first to demonstrate the involvement of RPTPs in neu- 
ronal axon guidance (for review see Desai et al., 1997; den 
Hertog, 1999). Drosophlla encodes ~8 RPTKs, at least 5 
of which function in this capacity. Of the newly identified 
RPTPs, one is related to mammalian RPTP-k and two 
share homology with RPTP-X/1A2, a type 1 transmem- 
brane PTP implicated in nervous system development and 
insulin-mediated pancreatic function. In regard to the 
CPTP class, Drosophlla studies on the CSW phosphatase 
were pivotal in demonstrating that a CPTP could function 
as a positive effector of cell signaling (Perkins et al., 1992). 
CSW is a member of the SH2-domain containing PTPs 
(SHP subclass). Mammals are known to have at least two 
SHPs, whereas no additional SHP proteins were found in 
Drosophlla, indicating that flies, like worms, possess a sin- 
gle SHP molecule. Overall the fly genome encodes at least 
5 CPTPs, namely CSW. PTP-ER. and newly identified 
CPTPs related to the mammalian MEGl, MEG2, and 
PTPDl phosphatases. Finally, Drosophlla contains four 
additional PTP-related proteins which are either difficult 
to classify or represent incomplete phosphatase fragments. 

DSPs are a diverse collection of phosphatase subgroups 



that share little sequence homology outside of the con- 
served Cys-X5-Arg motif with other DSP subgroups or 
with members of the larger PTP family. DSPs were origi- 
nally characterized by their ability to dephosphorylate both 
serine/threonine and tyrosine residues; however, some of 
the DSP subgroups, namely PTEN and myotubularin, also 
possess lipid phosphatase activity (Maehama and Dixon, 
1999). Approximately 18 DSPs are found in Drosophlla, 
including representatives of the MAPK phosphatase (MKP), 
PTEN, nuclear prenylated PRL, myotubularin, PIRl. 
CDC14, and CDC25 phosphatase groups. Of the nine 
DSPs uncovered by the fly genome project, six belong to 
the MKP group, a remarkable finding considering the ex- 
traordinary effort spent studying MAPK pathways in 
Drosophlla. Only Puckered, a negative regulator of the 
JNK pathway, previously had been identified by genetic 
techniques (Martin-Blanco et al., 1998). The failure of the 
new MKPs to be uncovered by genetic analysis may indi- 
cate that they participate in MAPK pathways controlling 
subtle or unappreciated phenotypes. Alternatively, their 
functions may have been obscured by redundancy within 
the MKP group or with other phosphatases. Additional 
DSPs revealed by the genome project include enzymes re- 
lated to CDC14 and myotubularin. Interestingly, flies also 
contain three myotubularin-related sequences that lack 
the active site Cys and Arg residues. As has been sug- 
gested for similar mammalian myotubularin-related mole- 
cules, these proteins may function as antiphosphatases by 
binding to and protecting substrates from dephosphoryla- 
tion by myotubularin or related phosphatase (Hunter, 1998; 
for review see Laporte et al., 1998). 

LMW-PTPs are ^150-amino acid residue cytoplasmic 
enzymes that have been shown to possess tyrosine phos- 
phatase activity (Ostanin et al., 1995). Other than a strictly 
conserved Cys-X5-Arg catalytic motif, LMW-PTPs bear 
litde resemblance to the other PTP members. Mammalian 
LMW-PTPs have been implicated to function in EPH 
(Stein et al., 1998) and PDGF receptor signaling (Chiarugi 
et al., 2000); however, much remains to be learned regard- 
ing the biological activity of these enzymes. Although two 
putative LMW-PTPs are revealed by the Drosophlla ge- 
nome project, both predicted proteins are larger than 
would be expected (424 and 250 amino acids, respec- 
tively). The smaller protein contains a complete LMW- 
PTP domain but lacks the conserved Arg residue in the 
catalytic motif. Intriguingly, the larger protein has two 
complete LMW-PTP domains. Although the first domain 
has a mutation in the active site Cys residue and is likely to 
be inactive, the second domain contains an intact PTP cat- 
alytic motif and presumably has catalytic activity. If this 
protein is made in vivo, it would represent a new type of 
LMW-PTP having a tandem catalytic domain structure 
similar to that observed in many RPTPs. Whether this 
molecule is an authentic LMW-PTP and whether it has a 
human counterpart remains to be determined. 

Lipid Phosphatases 

Lipid inositol phosphatases play an important role in 
mediating the intracellular balance of second messenger 
phospholipids. Drosophlla encodes approximately 20 ino- 
sitol phosphatases (IPP). only 2 of which were known pre- 
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viously. Six inositol-l,4,5-triphosphase phosphatase-like 
enzymes are contained in the fly genome; yet as is true 
for worms, no ortholog of the mammalian SH2-domain- 
containing inositol 5' phosphatase (SHIP) appears to be 
present. Drosophila does encode eight PPAP enzymes, 
which dephosphorylate phosphatidic acid to generate di- 
acylglycerol. The prototype member of this class, Wunen, 
was first identified in a genetic screen for factors controlling 
germ cell migration in the early Drosophila embryo (Zhang 
et al., 1996). Related proteins were subsequentiy identified 
in yeast, worms, and mammals. Remarkably, the fly genome 
project reveals seven additional Wunen-like phosphatases. 
Also uncovered by the genome project are six members of 
the inositol monophosphate phosphatase (IMP) group. Both 
the Wunen-like and inositol monophosphate phosphatases 
are characterized by small tandem gene arrangements, sug- 
gesting a limited expansion of these phosphatase families in 
Drosophila. The large number of newly identified inositol 
phosphatases underscores the hitherto unappreciated im- 
portance of lipid phosphoregulation in the fly. 

Comparative Analysis of 
Phosphorylation-dependent Signaling Pathways 

With the completion of both the Drosophila and C. ele- 



gans genome projects, together with our current knowl- 
edge of mammalian signaling pathways, we can begin to 
draw conclusions regarding the regulatory complexity of 
protein phosphorylation mechanisms across the evolution- 
ary spectrum. For example, in flies, worms, and humans, 
there is a high degree of structural and functional conser- 
vation between the components of the RTK and stress- 
activated signaling pathways, with the major difference 
being the number of isoforms present for individual path- 
way members. In higher organisms, the number of isoforms 
is increased, presumably providing greater potential for 
tissue- or stage-specific functions, signaling cross-talk, and 
regulatory complexity (Fig. 1). Significantly, differences 
in phosphorylation-mediated signaling cascades between 
worms, flies and humans become apparent when examin- 
ing the pathways involved in hematopoiesis and immunity. 
The JAK/STAT cascade, which has been implicated in he- 
matopoiesis and cytokine signaling, is present in humans 
and flies. Worms, however, lack JAK kinases but do pos- 
sess STAT proteins that are regulated by tyrosine phos- 
phorylation. Like humans, flies also contain the Toll/IKK/ 
NFkB pathway, which plays a role in the immune response 
to microbial organisms. No evidence of an inducible host 
defense system has been demonstrated in worms, consis- 
tent with the lack of this pathway in C, elegans. Also miss- 
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Figure 1. Comparison of the protein kinase/ 
phosphatase signaling pathways in flies, 
worms, and humans (see text for description) . 
Kinases are depicted as black rectangles, 
phosphatases are gray triangles, and other sig- 
naling components are in white. Shapes in 
dotted lines indicate mammalian proteins 
with no clear fly homologue; however, the 
function of these components in the pathway 
may be provided by other Drosophila pro- 
teins with related biochemical activities. 
Drosophila gene names are listed in paren- 
theses. 
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ing in the worm are the SYK/ZAP70 kinases which play an 
important role in human T and B cell signaling. Dro- 
sophila may possess some form of this pathway as indicated 
by the presence of the fly SHARK kinase. The Drosophila 
SHARK kinase is a member of the SYK/ZAP70 family; 
however, it is most closely related to the HTK16 kinase of 
Hydra based on the presence of ANK repeats which are 
not found in any of the known mammalian SYK/ZAP70 
family members (Chan et aL, 1994; Ferrante et al. 1995). 
Exact homologues of proteins functioning with SYK/ 
ZAP70 in the mammalian hematopoietic cascade, includ- 
ing the SLP-76, LAT. and BLNK adaptor proteins, the 
LCK and LYN kinases, and the SHP-1 and SHIP phos- 
phatases were not revealed by the fly genome project; how- 
ever, Drosophila proteins with related biological activities 
are found, namely SHP-2, inositol- 1. 4, 5-triphosphate phos- 
phatase, and other SRC-kinase members. Thus, further 
studies are required to determine whether a rudimentary 
form of the SYKyZAP70 pathway does function in flies. 

The completion of the Drosophila genome project also 
allows us to look globally at the pathways in which many 
of the newly identified fly enzymes may function. In par- 
ticular, many of the proteins revealed in the Drosophila 
genome are orthologs of kinases and phosphatases known 
to function in the Rac/Rho/CDC42 signaling pathway 
(Citron, ACK2, MLK2, MEKK4, LIM-domain kinase, 
PAK/STE20, and DSPs members), in cell cycle regula- 
tion (CDK7, BUBl. NEKl, NEK2, CDC14. CDC7, and 
PP2C), and in pathways establishing asymmetry and cell 
polarity (LKBl, SLKl, and EMK kinases). Whether these 
enzymes went undetected for so many years because of 
functional redundancy or unappreciated phenotypes has 
yet to be determined. 

In conclusion. ~251 protein kinases and 86 phos- 
phatases have been identified in the Drosophila genome. 
Although the overall number of fly enzymes is lower than 
that found C. elegans, the difference is largely due to the 
worm-specific expansion of certain gene families. Interest- 
ingly, no large expansions or deletions of particular kinase 
or phosphatase gene families were uncovered by the 
Drosophila genome project. All of the previously known 
Drosophila kinases and phosphatases were detected in our 
analysis, confirming the relative completeness of the ge- 
nome sequence data. Remarkably, almost 170 new protein 
kinases and phosphatases were identified by the fly ge- 
nome project (Table I). The next challenge for scientists 
will be to determine the role of these enzymes in Dro- 
sophila development and physiology. 
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Caenorhabditis elegans should soon be the first multicellular or- 
ganism whose complete genomic sequence has been determined. 
This achievement provides a unique opportunity for a comprehen- 
sive assessment of the signal transduction molecules required for 
the existence of a multicellular animal. Although the worm C 
elegans may not much resemble humans, the molecules that 
regulate signal transduction in these two organisms prove to be 
quite similar. We focus here on the content and diversity of protein 
kinases present in worms, together with an assessment of other 
classes of proteins that regulate protein phosphorylation. By sys- 
tematic analysis of the 19,099 predicted C elegans proteins, and 
thorough analysis of the finished and unfinished genomic se- 
quences, we have identified 411 full length protein kinases and 21 
partial kinase fragments. We also describe 82 additional proteins 
that are predicted to be structurally similar to conventional protein 
kinases even though they share minimal primary sequence iden- 
tity. Finally, the richness of phosphorylation-dependent signaling 
pathways In worms is further supported with the identification of 
185 protein phosphatases and 128 phosphoprotein-binding do- 
mains (SH2, PTB, STYX, SBF, 14-3-3, FHA, and WW) in the worm 
genome. 

Reversible protein phosphorylation plays a central role in 
regulating basic functions of all eukaryotes such as DNA 
replication, cell cycle control, gene transcription, protein trans- 
lation, and energy metabolism. Protein phosphorylation is also 
required for more advanced functions in higher eukaryotes such 
as cell, organ, and limb differentiation, cell survival, synaptic 
transmission, cell-substratum and cell-cell communication, and 
to mediate complex interactions with the external environment. 
Because aberrant protein phosphorylation is commonly the 
cause of cancer and other human diseases, a comprehensive 
knowledge of the key enzymes that regulate these functions can 
provide the basis for novel therapeutic intervention strategies. 

Tlie genomic revolution promises to provide a new paradigm for 
drug discovery, allowing one to selectively target the molecular 
basis of human disease. The completion of the Caenorhabditis 
elegans genome sequence gives us an opportunity to decipher the 
molecular nature of its signal transduction machinery. Several 
global analyses of proteins and protein domains present in C 
elegans have been presented elsewhere (1-4), revealing that protein 
kinases comprise the second largest family of protein domains in 
worms. The three most frequently occurring protein domains found 
in worms are seven transmembrane chenioreceptors (650 domains, 
3.5% of genome), protein kinases (496 domains, 2.6% of genome), 
and zinc finger C4 domains, including nuclear hormone receptors 
(275 domains, 1 .4% of genome). A more in-depth analysis has been 
performed on the 535 worm proteins containing zinc-binding 



domains, including the C4, C2H2, and C3HC4 ring finger types (3), 
and on the 83 worm homeobox transcription factors (4). Here, we 
present a comparative analysis of the enzymes and adaptor mole- 
cules that are the key components of the protein phosphorylation 
signaling network present in C elegans. 

Identification and Classification of C elegans Protein Kinases. To 

identify worm protein kinases, we first used an hmmer 2.1.1 (http:// 
hmmer.wustLedu/) profile search against the 19,099 predicted 
worm proteins, the finished and unfinished C elegans genomic 
sequence, and the worm chromosome assemblies. The nucleic acid 
databases were first translated in all six frames, and ORFs longer 
than 30 amino acids were parsed into a relational database. We 
generated a hidden Markov model based on 70 representative yeast 
and human protein kinases whose catalytic domains share <50% 
sequence identity with each other (5). Using a similar strategy, 
additional profiles were generated for other protein kinase-like 
domains (phosphoinositide kinases, atypical A6 kinases, diacylglyc- 
erol kinases, aminoglycoside resistance kinases, and microbial 
kinases), protein phosphatases, and domains capable of specifically 
binding to phosphotyTOsine (PTyr) or phosphoserine/threonine 
residues (SH2, PTB, STYX, SBF, 14-3-3, FHA, and WW domains). 
Scripts were written for reassembly of contiguous exons identified 
from genomic sequence to generate the predicted catalytic domain 
sequence of each kinase. Pairwise blast 2.0 (ftp://ncbi.nlm.nih.gov/ 
blast/executables/) analysis was performed to identify redundant 
entries, and putative protein kinases with low profile scores were 
manually inspected to determine whether they should be included 
in subsequent analyses. 

This analysis generated a nonredundant list of 493 protein 
kinase-like proteins and 21 protein kinase gene fragments from 
worms. This number will continue to increase as the genome is 
completed and the final assembly of the six worm chromosomes is 
achieved. Of note, we found >40 kinase domains from genomic 
analysis that were absent in the 19,099 worm protein dataset. These 
omissions result from the limitations of current protein prediction 
algorithms. Furthermore, numerous entries had apparent internal 
deletions of conserved kinase motifs, likely attributable to inap- 
propriately assigned splice junctions. These sequences were cor- 
rected before further classification. Many of the 19,099 proteins 
were alternate isoforms of the same gene, in which case we included 
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phosphotransferases; PTP, protein-tyrosine phosphatase. 
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Table 1. Summary and classification of phosphoprotein signaling 
molecules in worms, budding yeast, and humans 



Superfamily 



Protein kinase 



PK-like 



Fig. 1. Hyperbolic tree representation of C elegans protein kinases. Major 
protein kinase groups are labeled in different colors. A java tool for viewing this 
dendrogram can be found at www. kinase. com. 

only one of the proteins in our final assessment. In determining the 
total number of protein kinases, the three proteins determined to 
contain dual catalytic domains were only counted once. Many of the 
protein ORFs truncated the extremities of the kinase domain 
proteins, frequently because of their location near the end of a 
cosmid clone. In these cases, we searched for N- or C-terminal 
domains on adjacent cosmids to assist in the subsequent classifica- 
tion. One challenge of genomic data mining is the presence of 
sequence repeats. Tandem repeats and inverted repeats account for 
2.7 and 3.6% of the worm genome, respectively. In addition, worms 
contain large regions of tandem gene duplication, ranging from 
hundreds of bases to >100,000 bases (1). In some cases, the genes 
encoded within these regions are duplicated and have nearly 
identical sequences. Therefore, until the chromosome sequences 
are fully assembled, data-mining approaches may exclude some of 
these duplicated genes. 

A multiple sequence alignment was generated from the predicted 
catalytic domains of 398 of these protein kinase, which share > 15% 
amino acid identity with other entries. The aligned proteins were 
then clustered by using parsimony analysis, and the results were 
displayed as rooted and unrooted cluster dendrograms, and as 
kinase "retinograms" or hyperbolic trees using a java display tool 
(Fig. 1 and www.kinase.com;. The protein kinases were then 
classified into several kinase groups and families, based on relat- 
edness within the kinase catalytic domain to other worm, yeast, and 
vertebrate protein kinases. Further classification was performed by 
searching for noncatalytic domains linked to the kinase domain, 
including predicted transmembrane regions, SH2 domains and SH3 
domains, and Ig and fibronectin Type III domains. 

Table ,1 presents a summary of our classification of the 411 
protein kinases and 82 protein kinase-like motifs. A more detailed 
table of these proteins, along with basic informatics tools for 
retrieval and alignment of these sequences can be found on our web 
site at www.kinase.com. Table 1 also summarizes the results of a 
similar analysis of the completed yeast genome and of an ongoing 
effort from publicly available human expressed sequence tag and 
genomic databases. From this classification, we can now determine 
which protein kinases are conserved between yeast and worms, we 
can speculate on the origin of the protein kinase superfamily, and 
we can identify kinases that are yeast-specific and those that are 
restricted to higher eukaryotes. We tentatively identify "worm- 
specific" protein kinases, based on their absence from current 
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mammalian expressed sequence tag and nucleic acid databases. 
However, a final assessment will have to await completion of the 
Drosophila and human genome sequences. We also elaborate on 
some of the protein kinases and signaling pathways that evolution- 
arily appear only in more complex organisms such as vertebrates. 

In this review, we use the term "orthologues" to refer to proteins 
of different species that are believed to have a common ancestor 
and have an evolutionarily conserved function. Orthologous pro- 
teins typically have similar domain structure and share extended 
sequence similarity outside of their catalytic domains. Homologous 
proteins also share extended sequence similarity, but to a lesser 
degree than orthologues, and are not expected to complement one 
another functionally. However, within large protein superfamilies 
such as protein kinases, G protein coupled receptors, and nuclear 
hormone receptors, there is not a single expectation value that can 
be used to categorize all members definitively, and final classifica- 
tion will require experimental validation. 

Yeast- and Fungal-Specific Kinases. The first complete eukaryote 
sequence, that of the budding yeast Saccharomyces cerevisiae, was 
reported in 1996 (6). Shortly thereafter, we presented a compre- 
hensive analysis and classification of yeast protein kinases (7). Now, 
with the availability of a second eukaryotic genome, C elegam, we 
can perform a similar analysis and make more infomied general- 
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izations on which of these protein kinases are unique to yeast or 
fungi, and also on which protein kinases evolved during the 
emergence of multicellular organisms and are therefore not rep- 
resented in yeast or fungi. 

We now identify a total of 24 yeast-specific protein kinases and 
an additional 3 that are currently restricted to yeast and worms. 
Originally we defined four protein kinase subfamilies, containing a 
total of 18 members, to be yeast specific [protein kinase A (PKA)- 
related, RAN, ELM, and NPR/HAL5 families]. These remain 
yeast- or fungal-specific, as no close homologues are present in 
worms, and none have yet been described in vertebrates. However, 
the ELM family could be considered as a subfamily of the CAMK 
group. Riml5 is a yeast-specific kinase that is related to Schizosac- 
charomyces pombe Cekl, and its similarity to budding yeast 
YNL161 w places it as a distant member of the NDR family kinases. 
Two other protein kinase subfamilies, containing a total of five 
members, were originally recognized as having only distant homo- 
logues in higher organisms (NEK-like and PlM-like families). The 
prototype of the NEK-like family, YNL020C, has a homologue in 
worms, but not in mammals, although its C-terminal tail has a 
predicted coiled-coil structure related to numerous mammalian 
protein kinases (e.g., SLK/PLKK, TAKl). The two yeast PIM-like 
family members have catalytic domains related to worm and 
mammalian protein kinases, but have a unique N-terminai domain. 

Members of the NPR/HAL5 family are involved in ion ho- 
meostasis, polyamine transport, nutrient uptake, and response to 
nitrogen starvation, whereas Elml initiates a protein kinase cascade 
controlling pseudohyphal growth (8). Members of the RAN family 
are related to fission yeast Ranl/Patl, which regulates the switch 
between vegetative growth and nieiosis. Because these are fungal- 
specific responses, it is not surprising that these protein kinases are 
restricted to lower eukaryotes. 

A second set of "unique" yeast protein kinases was originally 
defined because they had no close homologues in other species (7). 
Most of these yeast protein kinases now have both worm and 
vertebrate orthologues (Cdc5, IpU, IreL Vpsl5, YGLISOW/Apgl, 
Swel, Spkl, Gcn2, YBR274W, YGR262C, and Bubl). Exceptions 
among this list of unique yeast protein kinases are YPL236C and 
Mpsl, which have orthologues in humans, but not in worms; 
YKL116C, which is distantly related to the EMK-family, yet has 
only weak homologues in worms and humans; and YKL171W, 
YGR052W, and YPR106W, which remain yeast specific protein 
kinases. Two sequences that were excluded from our previous 
analysis of yeast protein kinases deserve mention. The budding 
yeast protein Iksl can be classified as a yeast-specific protein kinase 
because it still has no homologues in worms or other species 
whereas another yeast kinase-like sequence, SCY 1, has orthologues 
in C. elegans and Arabidopsis, but none thus far in vertebrates. A 5. 
pombe protein, which is distantly related to SCYi, also has a single 
worm orthologue. 

Worm-Specific Protein Kinases. Which protein kinases are specific to 
worms? Protein kinases that are absent from yeast yet present in 
worms are likely to be involved in the complex signal transduction 
pathways that are required for the existence of multicellular or- 
ganisms. These might include protein kinases involved in cell- 
substratum and cell-cell adhesion, transmembrane signaling in 
response to humoral factors, protein kinases involved in cell survival 
or programmed cell death, and protein kinases whose signals 
regulate metazoan-specitlc transcription factors, particularly those 
containing Zn-finger domains. 

In the absence of complete genome sequences of other multi- 
cellular eukaryotes, we tentatively classify 165 protein kinases (plus 
9 protein kinase fragments) as worm-specific. The majority (134, 
80%) fall into three groups (CKl, PER, and KIN-15) whereas the 
others are distant members of common protein kinase families or 
belong to worm specific subfamilies. Five protein kinase subfami- 
lies, containing a total of 12 members, can tentatively be defined as 



worm-specific (C04G2.10, K08B4,5, K09C6,7, R107.4, and 
ZK177.2-families), An additional 15 unique worm protein kinases 
are also identified, which to date have no close homologues in yeast, 
worms, or in higher organisms. However, mammalian homologues 
of some of these worm protein kinases are already beginning to 
appear in publicly available expressed sequence tag databases, and 
assignment of a protein kinase as being truly worm-specific will 
have to await the completion of the Drosophila and human genome 
sequences. 

Members of four other protein kinase or kinase-like subfamilies 
are disproportionately represented in worms compared with hu- 
mans. Clusters of 5-9 members of each of these families are 
localized to short regions (<1 megabase) of chromosomes II and 
IV, suggesting they may each have expanded as a result of extensive 
tandem gene duplication. The chromosomal density of protein 
kinases is graphically depicted on our web site at www.kinase.com. 
The four gene families are the CKl -family, the KIN-15-family of 
receptor protein-tyrosine kinases, the FER-family of cytoplasmic 
protein-tyrosine kinases, and the kinase-like domains of the recep- 
tor guanylyl cyclases. 

CKl family. The worm genome contains 87 CKl (casein kinase 
I) members (plus 7 additional partial catalytic domains) whereas 
there are only 4 known members in budding yeast and 6 in humans. 
Genetic evidence from the yeast homologues suggests CKls may be 
involved in DNA repair and cell division, and mammalian CKls 
have been shown to phosphorylate p53 in Gl and G2, possibly 
affecting cell sensitivity to DNA damage at these checkpoints (9). 
Little is known regarding the function of CKls in worms, but the 
enormous arborization and diversification of this kinase family may 
be an adaptation allowing for enhanced DN A repair in response to 
excessive exposure to environmental mutagens, 

KIN-15/16 family. C elegans contains 16 members of a unique 
family of receptor protein-tyrosine kinases whose presence to date 
is restricted to this species. These transmembrane proteins have 
unusually short (<50-aa) extracellular domains, and many are 
clustered within the genome, as though they arose through tandem 
gene duplication. The prototype members of this family, KIN-15 
and KIN-16, are expressed in the hypodermal syncytium, which 
expands by cell fusion during larval development (10). Compared 
with wild-type worms, KTN-15 and KIN-16 deletion mutants pro- 
duce fewer embryos and rarely develop into adults, but, when they 
do mature, they typically exhibit extrusion of the gonads through the 
vulva (11). Therefore, KIN-15/16 appear to be essential genes, yet 
may undergo variable compensation by 1 of the 14 other homo- 
logues. One of the KIN-15 clusters is interspersed with chitinase 
genes, which are known to function in cell wall morphogenesis 
during the molting process and in fungal resistance. Expansion of 
this region may have been necessary during evolution to facilitate 
this aspect of larval development. An alternative function for 
KIN-15-family kinases is suggested by the fact that overexpression 
TKR-1 (C08H9.5) causes a 40-100% extension of life expectancy 
in worms (12). Unlike other life extension {age) mutants, TKR-1 
transgenics do not form dauers, and their longevity has been 
attributed to an increased resistance to ultraviolet and thermal 
stress. 

FER family. The worm genome contains 42 members (plus 2 
additional partial catalytic domains) of the FER-family of single 
SH2-containing cytoplasmic protein-tyrosine kinases. Most of 
these genes are interspersed throughout the worm genome; how- 
ever, nine members reside within a 1.1-megabase region on chro- 
mosome IV. Unfortunately, no literature is available on the func- 
tion of any of these protein kinases in worms, but the two mam- 
malian homologues, FER and FES, have been demonstrated to play 
a role in cell adhesion, to signal downstream of cytokme receptors, 
and to fiinction as oncogenes (13). Conceivably, additional human 
representatives will be revealed on completion of the human 
genome sequence, possibly with restricted expression. Alterna- 
tively, their function may be replaced in humans by expansion and 
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diversification of non-FER cytoplasmic protein-tyrosine kinases, of 
which worms have only 10 whereas humans have at least 34. Most 
evident Ls a dramatic expansion of SRC-family kinases and emer- 
gence of ZAP70 and JAK family kinases in higher eukaryotes that 
are not found in the worm genome. 

Conserved Metazoan Protein Kinase Signaling Transduction Pathways. 

Worms provide an elegant model system for studying signal trans- 
duction. This transparent animal is comprised of 959 somatic cells 
plus 131 cells destined for programmed cell death. The C. elegans 
hermaphrodite contains 302 neurons and 81 muscle cells and has a 
brain, reproductive system, and digestive tract (ref. 14; http:// 
dauerdigs.biosci.missouri.edu/Dauer-World/Wormintro.htm!). It 
provides a complex yet tractable system for studying development, 
metabolism, aging, and behavioral responses to a number of stimuli. 
Regulation of many of these processes is carried out through signal 
transduction pathways that are also present in humans. Not sur- 
prisingly, all of the major protein kinase groups found in worms are 
also conserved in humans (15). The number of protein kinases 
classified into each major group from yeast and worms, along with 
a current estimate from humans, is provided in Table 1. These 
numbers represent a current analysis, but new protein kinases are 
being discovered every month as the worm genome sequencing 
project continues. Some of these entries may also represent pseu- 
dogenes containing f rameshif ts that result in incomplete translation 
into a full kinase catalytic domain. 

AGC Group. The AGC group of worm protein kinases contains 
representatives of many of the known types of cyclic nucleotide- 
dependent, NDR or DBF2, and ribosomal S6 kinase families. 
Worms also contain members of the cGMP-dependent kinase 
(PKG), RSK, and G-protein coupled receptor kinase families that 
are absent from budding yeast. Two of the S6 kinase members have 
dual catalytic domains similar to vertebrate RSK enzymes, where 
the N-terminal domain clusters into the AGC group and the 
C-terminal kinase domain is most related to the CaMK group. 
Worms have four members of the AKT family, two being close 
orthologues of mammalian AKTl/PKB/R ACa, and two related to 
the AKT upstream kinase, PDKl . AKT is a mammalian protoon- 
coprotein regulated by phosphatidylinositol 3-kinase (PI3-K), 
which appears to function as a cell survival signal to protect cells 
from apoptosis (16). Insulin receptor, RAS, PI3-K, and PDKl alb 
act as upstream activators of AKT whereas the lipid phosphatase 
PTEN functions as a negative regulator of the PI3-K/AKT pathway 
(17). Downstream targets for AKT-mediated cell survival include 
the proapoptotic factors BAD and Caspase9 and transcription 
factors in the forkhead family, such as DAF-16 in the worm. AKT 
is also an essential mediator in insulin signaling, in part because of 
its use of GSK-3 as another downstream target. Each of these 
components of the AKT/PI3-K pathway is conserved in worms, 
providing a powerful system for genetic dissection of a major cell 
survival signal. 

The cAMP-dependent protein kinases (PKA) consist of het- 
erotetramers comprised of two catalytic (C) and two regulatory (R) 
subunits, in which the R subunits bind to the second messenger 
cAMP, leading to dissociation of the active C subunits from the 
complex. Worms have two PKA catalytic domains and two regu- 
latory subunit genes (R07E4.6 and ZK370.4). Additional cNMP- 
binding domains are present in the two worm representatives of the 
PKG family, in several cNMP-gated ion channels, and in a cAMP- 
regulated guanine nucleotide exchange factor (T20G5.5). 

CaMK Group. In the CaMK group, the most abundant representa- 
tives include Ca^ ' /calmodulin -regulated and AMP-dependent 
protein kinases and EMK-related kinases. Worms also contain 
members of the death-associated protein kinase, mitogen-activated 
protein kinase (MAPK)-associated protein kinase, myosin light 
chain kinase, and phosphorylase kinase families that are absent 



from budding yeast. All of these protein kinase families have likely 
evolved as a result of the demands of multicellularity and the 
emergence of complex organ systems. For example, even though 
yeast have myosin homologues, they lack myosin light chain kinases. 
These protein kinases have presumably evolved to regulate myosin 
during muscle contraction. A worm contig still under construction 
appears to contain a phosphorylase kinase catalytic 7 subunit 
orthologue, consistent with the presence of two orthologues of the 
noncatalytic phosphorylase kinase a subunits, which facilitate 
calmodulin-binding and are required for activation of the mamma- 
lian holoenzyme. 

Worms lack a homologue of the mammalian Trio-family kinases. 
Trio is a large multidomain protein kinase containing Ras and Rho 
guanine exchange factor domains in addition to PH, SH3, and 
spectrin domains (18). Trio may link Rho and Rac signaling 
pathways and appears to be involved in the cytoskeletal changes 
required for cell migration. Although worms lack a member of this 
kinase family, they do have at least two proteins related to the entire 
noncatalytic domain of Trio (UNC-73 and F55C7.7). 

We have also identified a forkhead homology (FHA) domain- 
containing CHK2 orthologue in worms. In yeast, Spkl/Rad3 
functions as a DNA damage checkpoint sensor through its FHA 
domain interacting with phosphorylated Rad9 (19). Although no 
close orthologue of Spkl exists in metazoans, this function appears 
to be replaced by CHK2/CDS1, which is phosphorylated in re- 
sponse to DNA damage and may work in conjunction with CHKl 
kinase to phosphorylate CDC25C to prevent premature entry into 
mitosis (20). 

CMGC Group. In the CMGC group of serine/threonine kinases, all 
of the main subfamilies are conserved between yeast, worms, and 
mammals, including cyclin-dependent kinase (CDK), MAPK, 
GSK-3, and CLK. An exception is the RCK family, which is absent 
from yeast but has two members in worms and at least seven in 
humans. The worm RCK kinases are most similar to mammalian 
MAK, or male germ cell-associated kinase, which has been impli- 
cated in spermatogenic meiosis and in signal transduction pathways 
for sight and smell. Worms have 14 CDKs (compared with 5 CDKs 
in yeast) including orthologues of CDC2, CDK3, CDK5, CDK7, 
and CDK8, and contain 34 cyclins, compared with 23 in budding 
yeast (Table 1), including one cyclin H orthologue, which we predict 
will interact with worm CDK-7 to generate a functional cyclin- 
activated kinase. 

Worms have 14 MAPKs, compared with 6 in yeast and at least 
14 in humans. The worm MAPKs include representatives of each 
of the major types of MAPKs: ERK/MAPK, JNK/SAPKl, p38/ 
SAPK2, BMK/ERK5, and NEMO-like kinase (NLK) (21). In 
budding yeast, three protein kinase families (the prototypes being 
Ste20, StelL and Ste7) function upstream of the MAPKs to 
generate at least four distinct MAPK signaling pathways that 
mediate the response to pheromone, nutritional starvation, and 
cellular or osmotic stress. In multicellular organisms, these MAPK 
cascades have evolved to mediate responses to diverse signals 
including growth factors, mitogens, hormones, and cytokines, in 
addition to the more primitive stress responses to anoxia, heat 
shock, and osmotic stress. 

STE Group: iVIAPK Pathways. The STE family refers to the three 
classes of protein kinases that lie sequentially upstream of the 
MAPKs. In worms, this group includes 10 STE7 (MEK or 
MAPKK) kinases, 2 STEll (MEKKor MAPKKK) kinases, and 12 
STE20 (MEKKK) kinases. Based on the number of MAPK and 
STE-family kinases in C elegans, we predict worms will contain at 
least 8-10 MAPK pathways. In humans, several protein kmase 
families that bear only distant homology with the STEll family also 
operate at the level of MAPKKKs, including RAF, MLK, TAKl, 
and COT. Except for COT, worms also have orthologues of each 
of these kinases. Because crosstalk takes place between protein 
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Fig. 2. Schematic representation of the human and C elegans receptor protein-tyrosine kinase families. Catalytic domains are shown In yellow. The names of the 
human RTKs are in black, and the names of the worm RTKs are in red. 



kinases functioning at different levels of the MAPK cascade, the 
large number of STE family kinases could translate into an enor- 
mous potential for upstream signal specificity and diversity. 

Protein-Tyrostne Kinase Group: Receptor Protein-Tyrosine Kinases 
(RTKs). The largest group of protein kinases in worms are the 
protein-tyrosine kinases (PTKs), with 92 members and 5 fragments. 
We predict this will also remain the largest group of protein kinases 
in higher eukaryotes, including humans, where the current count is 
'^lOO. These numbers are impressive when one considers that this 
family is absent from budding yeast. Yeast, however, do have a 
*'budding" tyrosine phosphorylation signaling system, with several 
dual-specificity kinases (CLK-like, Ste7/MEK family, Swel, Spkl/ 
Rad53, Mpsl), an atypical A6 PTK, 3 protein-tyrosine phospha- 
tases, 16 dual-specificity and low molecular weight phosphatases, 
and 6 "infant" P.Tyr-binding proteins comprising an apparently 
nonfunctional SH2 domain protein and 5 phosphatase-like STYX 
domains. Budding yeast lack PTB domains, and none of the six 
potential PTyr-binding domains have been functionally verified. 

The 92 worm PTKs can be further classified into receptor 
protein-tyrosine kinases (RTKs) and cytoplasmic protein-tyrosine 
kinases (CTKs) based on the presence or absence of a transmem- 
brane domain and SH2 or SH3 domains. Based on this analysis, the 
worm genome contains 40 RTKs and 52 CTKs. The 40 RTKs 
include 16 members of the worm-specific KJN-15-family, 13 RTKs 
with orthologues representing 10 of the 20 families of human RTKs, 
and 11 RTKs that remain unclassified with no identifiable mam- 
malian counterpart (Fig. 2). Genetic studies in worms support the 
classification of five of these worm- human pairs, including LET- 
23/EGF receptor, DAF-2/insulin receptor, EGL-15/FGF recep- 
tor, CAM-1 /RORl receptor, and VAB-1 /EPH receptor, and each 
of these orthologous pairs mediates similar functions in worms and 
man, with specificity for epidermal, metabolic, mesodermal, and 
neuronal signaling pathways. 

Based on extracellulai^ domain homologies, we also predict three 
worm orthologues of PDGFR/FLK/VEGFR, two for DDR, and 
one each of RYK, ROS, and LTK/ALK. Two of the unclassified 
RTKs have weak similarity to MET, but not enough to warrant 
inclusion into this family. Missing in C elegans are TRK/nerve 
growth factor receptors, AXL/TYR03, TIE/angiopoietin recep- 
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tor, RET/GDNF receptor, and MUSK family members. Identifi- 
cation of three members of the PDGFR/VEGFR family is signif- 
icant, as they emerged only through analysis of the genomic data 
and failed to be properly identified from a recent analysis of the 
predicted 19,099 proteins. Each of these receptors contains multiple 
Ig-like extracellular domains and a single spht kinase domain with 
closest homology to human FLTl/VEGFRl and the C. elegans 
KIN- 15 family. However, they are likely to represent early ancestors 
to both the FLK and PDGFR kinase lineages. Expression of the 
mammalian FLK/VEGFR RTKs is primarily restricted to endo- 
thelial cells, and they play important roles in the early differentia- 
tion of hematopoietic and endothelial lineages as well as in normal 
and pathologic angjogenesis in the adult. However, because worms 
lack a vasculature, the function of these receptors is not obvious. |J>| 
Tho formation of mammalian vasculature is reminiscent of the g0: 
process by which networks of branching tubes develop into the lung |||^:; 
and kidneys. Invertebrate VEGFRs may therefore be involved in 
processes that later evolved into a program for limb and organ 
development in vertebrates. 

Surprisingly little is known about how the ligand-activated 
VEGFRs mediate these effects. Gene knockout studies in mice 
suggest that A-RAF or MEKKl may function downstream of 
VEGFRs, and recent evidence implicates the involvement of 
STATs (signal transducer and activator of transcription) in VEGFR 
signaling (22). Genomic analysis reveals two worm orthologues of 
STATs (Y51H4, Y43D4 unfinished and F58E6.1), making the 
VEGFR-STAT association an attractive area for further investiga- 
tion. STATs contain an SH2 domain, a tyrosine phosphorylation 
domain, and a DNA-binding domain, and function in a unique 
JAK-STAT signaling pathway. Extensive studies in mammalian 
systems have established a model in which JAK kinases are 
constitu lively bound to the cytoplasmic portion of cytokine recep- 
tors and are activated on receptor dimerization, facilitating recruit- 
ment of STATs to the receptor complex. Subsequent STAT phos- 
phorylation leads to their dimerization and translocation to the 
nucleus, where they function directly as transcription factors. Dro- 
sophila and Dictyostelium STATs both regulate cell division and 
pattern formation (23, 24). Drosophila STAT has been genetically 
and biochemically linked to a JAK-STAT signal transduction 
pathway that regulates pair-rule genes and hematopoiesis. Dictyo- 
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stelium STAT plays an essential role during the differentiation and 
aggregation of independent spore cells into stalk cells in response 
to the chemical signal referred to as differentiation-inducing factor. 
Furthermore, the DictyosteUum AX2 PTK has a second kinase-like 
domain found only in JAK-family kinases, suggesting the existence 
of a signaling network similar to that in flies and mammals. 
However, worms have no cytokines, no cytokine receptors, and no 
JAK-family kinases. Possibly, the JAK kinase function is replaced 
by a worm-specific PER kinase, or the STATs may have initially 
evolved to serve an alternative purpose. Mammalian STATs are 
also involved in signaling through receptors for growth factors such 
as EGF, PDGF, VEGF, and angiopoietins. Because the EOF and 
VEGF signaling systems are present in worms, it is tempting to 
speculate that these represent the primordial raison d'etre for 
STATs. 

In genera), related RTKs bind related ligands. Tn humans, there 
are at least 12 ligands, encoded by 10 genes, that have been shown 
to bind selectively to at least one of the four known EGFR-family 
members. Each of these ligands shares a conserved six-cysteine 
pattern in its receptor binding domain. In worms, LIN-3 has been 
shown to function as a LET-23 ligand. Although EGF motifs are 
prevalent in worms, we have identified three EGF-Iike proteins 
(F58G4.4, Y69H2.2, YG7()G10A.2) that, in addition to the six 
cysteines, conserve many of the crucial receptor-binding residues 
and are juxtaposed next to a putative transmembrane domain, in a 
pattern similar to all known EGFR-family ligands. Worms also 
contain at least 3 FGF-iike ligands, 12 insulin-like ligands (many 
more on inclusion of relaxin-related ligands), 2 distant homologues 
of VEGF, and 4 ephrin-related ligands, some of which would be 
predicted to bind to their cognate receptors. 

Orthologues of other RTK ligands prove more difficult to identify 
empirically. We see no evidence for a bona fide PDGF or NGF, and 
searches for ligands for MET, TIE, and AXL-family RTKs are con- 
founded by their similarity to plasminogen, fibrinogen, and fibrillin, 
respectively. Furthermore, except for weak homologues of MET, these 
three RTK families are absent from worms. Nevertheless, the signifi- 
cance of a putative Ang2-like protein (Y43C5A2) in the absence of a 
TIE-family RTK remains to be determined. 

Protein-Tyrosine Kinase Group: CTKs. Most of the 52 CTKs in worms 
belong to the single SH2-containing FER family. Of the remaining 
10 CTKs, there are 2 orthologues of the SH3-containing ACK, and 
1 each of FYN (SRC family CTK), FRK, CSK, ABL, and FAK, plus 
3 unclassified CTKs. In vertebrates, CSK negatively regulates 
FYN-family kinases by phosphorylation of a C-terminal tyrosine 
facilitating a conformational change through an intramolecular 
SH2-P.Tyr interaction (25). We predict a similar functional inter- 
action between worm FYN and CSK. Co-evolution of this regula- 
tory pair suggests even early metazoans required a means to 
dampen signaling through CTKs. Notably absent in worms are 
protein kinases related "to the ZAP70 and JAK CTKs, whose 
primary role in mammals is in signaling through the T cell and 
cytokine receptors, both of which represent more specialized 
pathways not present in worms. Humans have eight SRC-family 
kinases whereas worms have only one. This redundancy has con- 
founded efforts to dissect out the precise role of these CTKs in 
human biology, often requiring "triple knockouts" to demonstrate 
a deficiency. The simplicity of non-FER-Iike CTKs in worms may 
be helpful in placing these CTKs within specific signaling cascades. 

Protein-Tyrosine Kinases: Adaptor and Docking Molecules. Ligand 
activation of RTKs results in tyrosine phosphorylation of both the 
receptor itself (autophosphorylation) and of downstream sub- 
strates. These phosphorylated tyrosines then function as attach- 
ment sites for proteins containing SH2 and other P.Tyr-binding 
domains. We have identified 74 proteins containing a total of 77 
SH2 domains in worms. The majority of these SFL2 domains are in 
CTKs, two are present in a SHP2-related PTP, and the remainder 



are predicted to represent orthologues of a variety of adaptor 
molecules, including phospholipase C7, CBL, CIS4/SOCS5, CRK, 
NCK, SEM-5/GRB2, SHC, tensin, STAT, and VAV. Worms also 
contain at least 16 PTB domains, which in some cases have been 
found to interact specifically with tyrosine phosphorylated proteins. 
Worm PTB-containing proteins include orthologues of SHC, which 
also contains an SH2 domain, neuronal transmembrane protein 
Xn, and an insulin receptor substrate (IRS) family member. The 
mammalian XI 1 PTB domain does not to bind to P.Tyr, so we 
anticipate only a few of these worm domains will function as 
P.Tyr-binding domains. Additional potential phosphoprotein- 
binding domains identified in worms include three 14-3-3 domains, 
22 WW domains, and 11 FHA domains. 

IRS-1 and IRS-2 are major substrates of the insulin receptor RTK in 
mammals, and disruption of IRS-2 in mice leads to metabolic defects 
similar to diabetes. Worms have multiple insulin-like peptides, a recep- 
tor, and an IRS orthologue, demonstrating the early origins of meta- 
bolic regulation in multicellular organisms. The presence of such a 
diverse array of adaptor molecules underscores the utility of worms as 
a model for understanding mammalian signal transduction. 

Other Protein Kinases. Approximately 15% of tlie worm protein 
kinases do not fall into one of the six major groups but include smaller 
families with repi-esentatives in higher eukaryotes, including CHKl, 
DYRK, MLK, TAKl, PIM, RAF, STKR, and the mitotic kinases 
(BUBl, AURORA, PLK, and NIMA/NEK), Recent genetic and 
biochemical data place TAKl (transforming growth factor ^-associated 
kinase) on a MAPK-like pathway at the level of a MAPKKK acting 
upstream of the MAPK-family member NLK. The worm orthologue 
of TAKl and NLK regulate Wnt-mediated cell polarization during 
embryogenesis (21). Biochemical data also demonstrate that th^ 
MAPK-like pathway negatively regulates Wnt signaling because NLK 
phosphorylates the TCF/LEF HMG transcription factors, thereby 
inhibiting Wnt-regulated binding of the /3-catenin-TCF complex to 
DNA. Botli of these pathways are coaserved between mammals and 
worms. The likely orthologous human/worm pairs on the TAKl 
MAPK-like pathway include TAKl/MOM-4, NLK/UT-1, and 
TCF4/P0P-1. Upstream regulators may include TGF^l/DBL-l, 
TGFp type I receptor/SMA-6, TGF/3 type H receptor/DAF4 (worms 
have three receptor serine kinases). Additional components of the 
Wnt-signaling pathway, such as cadherin, the adenomatous polyposis 
coli tumor suppressor gene (APC), disheveled, and GSK-3 kinase are 
also present in wonns, suggesting that there may be a primordial 
connection between polarized control of cell division/migration and 
cellular transformation in vertebrates (26). 

Microbtal-Like Kinases: Origin of Protein Kinases? The availability of 
die sequence of the first complete metazoan genome, combined 
with the sequence of budding yeast and several prokaryotic and 
Arclmea genomes, provides an excellent opportunity to reassess 
current theories on the evolutionary origin of protein kinases. Pknl 
is a bacterial protein kinase-like sequence first described in the 
Gram-negative bacteria Myxococcus xanthus, which functions in 
growth and differentiation and in the ability of this prokaryote to 
form a fruiting body in response to nutrient starvation. Pkn-related 
proteins are present in other prokaryotes, including Streptomyces, 
Bacillus, Mycobacterium, Pseudomonas, Chlamydia, and Synecho- 
cystis, where they are involved in virulence, secondary metabolism, 
sporulation, and complex growth cycles (27). However, there are no 
Pkn homologues in bacteria with less complex life cycles, such as 
Escherichia coli, and Haemophilus influemae, or in any Archaea, 
suggesting they may have been acquired by horizontal transmission 
from an early eukaryote, and are unlikely to represent the ancestral 
founders to protein kinases. 

In our kinase profile searches of the worm genome, we detected 
several entries with low profile scores, yet with significant (E 
value < 10"-) random expectation (E) values. Most of these 
contained similarity to kinase subdomains I, II, and VI, containing 
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the consensus GxGxxGxV, VAVK, and KxDxxxxN motifs, respec- 
tively. Upon further analysis, many of these entries could be 
classified into distinct families designated ABCl, RIOl, YGR262, 
diacylglycerol kinase, choline/ethanolamine kinases, and the 
YLKl-antibiotic resistance kinases. The first three families are 
named after their prototypic members in S. cerevisiae (27). 

Worms contain three proteins related to the budding yeast 
ABCl. The yeast protein is required for the assembly of the 
mitochondrial cytochrome c reductase complex, which functions as 
an electron carrier during oxidative phosphorylation to generate 
ATP (28). ABCl homologues are present in numerous pro- 
karyotes, including Mycobacterium, Clostridium, Rickettsia , Syn- 
echocystis, Azohacter, and Enterobacteriaceae such as E, coli and 
Providencia stuartii, in addition to the Archaea.Metlianobacterium. 
ABCl-like proteins are also present in eukaryotes, including fission 
yeast, Arabidopsis, worms, and mammals. Although ABCl homo- 
logues are absent from bacteria such as Mycoplasma, Bacillus, 
Haemophilus, Helicobacter, and spirochetes, their presence in other 
prokaryotes, Archaea, and eukaryotes positions them as likely 
representatives of the primordial protein kinase, which was the 
progenitor of the eukaryotic protein kinase family. Based on their 
recognized role in mitochondrial ATP production and because they 
maintain many of the structurally important residues and motife 
involved in ATP binding, the ABCl-family proteins may either bind 
ATP or act as phosphotransferases. Conceivably, the ABCl pro- 
teins transfer phosphate to proteins as part of a feedback loop to 
sense mitochondrial ATP levels. 

The RIOl family has three representatives in worms and is named 
after one of the two homologues in budding yeast. There are also 
representatives from sQwcral Ajvhaea species, but none from bac- 
teria, making them a less attractive candidate as a progenitor to the 
protein kinase lineage. 

Atypical Protein Kinases and Protein Kinase-Like Domains. Worms 
contain 26 kinase-Iike domains present in receptor guanylyl cyclases 
(there are 10 additional soluble guanylyl cyclases), and at least 7 
diacylglycerol kinases, 7 choline/ethanolamine kinases, and 30 
YLKl -related antibiotic resistance kinases. Each of these families 
contain short motifs that were recognized by our profile searches 
with low scoring E-va!ues, but a priori would not be expected to 
function as protein kinases. Instead, the similarity could simply 
reflect the modular nature of protein evolution and the primal role 
of ATP binding in diverse phosphotransfer enzymes. However, two 
recent papers on a bacterial homologue of the YLKl family 
suggests that the aminoglycoside phosphotransferases (APHs) are 
structurally and functionally related to protein kinases (28, 29). 
There are over 40 APHs identified from bacteria that are resistant 
to aminoglycosides such as kanamycin, gentamycin, or amikacin. 
The crystal structure of one well characterized APH reveals that it 
shares >40% structural identity with the two-lobed structure of the 
catalytic domain of cAMP-dependent protein kinase (PKA), in- 
cluding an N-terminal lobe composed of a five-stranded antiparallel 
P sheet and the core of the C-terminal lobe, including several 
invariant segments found in all protein kinases (29). APHs lack the 
GxGxxG normally present in the loop between j3 strands 1 and 2 but 
contain 7 of the 12 strictly conserved residues present in most 
protein kinases, including the HGDxxxN signature sequence in 
kinase subdomain VIB (29). Furthermore, Daigle et al (30) have 
demonstrated that this APH also exhibits protein-serine/threonine 
kinase activity, suggesting that the worm YLKl-related molecules 
may indeed be functional protein kinases. 

The eukaryotic lipid kinases (PI3Ks, PI4Ks, and PIPKs) also 
contain several short motifs similar to protein kinases but otherwise 
share minimal primary sequence similarity. However, once again, 
structural analysis of PIPKIIjS defines a conserved ATP-binding 
core that is strikingly similar to conventional protein kinases (31). 
Three residues are conserved among all of these enzymes, including 
(relative to the PKA sequence) Lys-72, which binds the a-phos- 



phate of ATP, Asp- 166, which is part of the HRDLK motif, and 
Asp-184, from the conserved Mg^^ or Mn-"^ binding DFG motif 
(31). Tlie worm genome contains 12 phosphatidylinositol kinases, 
including 3 PI3-kinases, 2 PI4-kinases, 3 PIP5-kinases, and 4 
PI3-kinase-related kinases. The latter group has four mammalian 
members (DNA-PK, FRAP/TOR, ATM, and ATR), which have 
been shown to participate in the maintenance of genomic integrity 
in response to DNA damage and exhibit true protein kinase activity, 
raising the possibility that other Pl-kinases may also act as protein 
kinases. Regardless of whether they have true protein kinase 
activity, PI3-kinases are tightly linked to protein kinase signaling, as 
evidenced by their involvement downstream of many growth factor 
receptors and as upstream activators of the cell survival response 
mediated by the AKT protein kinase. 

Tliere are several proteins with protein kinase activity that appear 
structurally unrelated to the eukaryotic protein kinases. These include 
Dtctyostelium myosin heavy chain kinase A, Physarwn pofyceplmlwn 
actin-f ragmin kinase, the human A6 PTK, human BCR, mitochondrial 
pyruvate dehydmgencise and branched chain fatty add dehydrogenase 
kinase, and the prokaryotic "histidine'' protein kinase family. Worms 
lack representatives of the actin-fragmin kinases, BCR, and bacterial 
histidine kinases yet do contain a single representative of the other 
classes of atypical kinases and two homologues of the A6-related PTKs. 
The single worm orthoiogue of the Dictyostelkmi myosin heavy chain 
kinase A (32) proves to be the worm eukaryotic elongation factor 2 
kinase (33). The slime mold, worm, and human eukaryotic elongation 
factor 2 kinase homologues have all been demonstrated to have protein 
kinase activity, yet they bear little resemblance to conventional protein 
kinases except for the presence of a putative GxGxxG ATP-binding 
motif (33). 

The so-called histidine kinases are abundant in prokaryotes, with 
>20 representatives in E, coli, and have also been identified in yeast, 
molds, and plants. In response to external stimuli, these kinases act as 
part of two-component s>'stems to regulate DNA replication, cell 
division, and differentiation through phosphorylation of an aspartate in 
the target protein (34). To date, no "histidine" kinases have been 
identified in metazoans, although mitochondrial pyruvate dehydroge- 
nase (PDK) and branched chain a-ketoacid dehydrogenase kinase are \gt 
related in sequence. PDK and branched chain a-ketoadd dehydroge- 
nase kinase represent a unique family of atypical protein kinases 
involved in regulation of glycolysis, the dtric acid cycle, and protein ||>| 
synthesis during protein malnutrition. Structurally, they conserve only |(^ 
the C-terminal portion of "histidine" kinases, induding the G box 
regions. Branched chain a-ketoacid dehydrogenase kinase phosphor- ^^^^ 
ylates the Ela subunit of the branched chain a-ketoadd dehydrogenase 
complex on Ser-293, proving it to be a functional protein kinase (35). 
Although no bona fide "histidine" kinase has yet been identified in 
worms or humans, they do contain PDK homologues (one in worms 
and five in humans). However, the paudty of PDKs in worms makes it 
unlikely that they fill in for the absence of "histidine" kinases in 
metazoans. Instead, these signaling cascades have more likely been 
replaced by pathways initiated through RTKs, 

Based on these examples of atypical protein kinases present in the 
worm genome, we predict additional worm protein kinases will be 
functionally identified that lack any of the obvious motifs conserved 
in the conventional members. Indeed, various biochemical data 
point to the existence of true histidine, lysine, and arginine kinases 
in metazoans, yet their structural identity remains a mystery. 

Protein Phosphatases. Because of their important role in signal 
transduction, it is not surprising that the activity of protein kinases 
must be tightly regulated. This is accomplished through autoinhi- 
bition. autophosphorylation, transphosphorylation, dimerization, 
and cellular localization. Equally important is the role of protein 
phosphatases, v/hich act to remove these regulatory phosphates 
from the protein kinase and its substrates. Because our analysis 
reveals worms to have a mature P.Tyr-signaling network, especially 
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when compared with the yeast genome, we surveyed the worm 
genome for protein-tyrosine phosphatases. 

Our analysis reveals 83 conventional protein-tyrosine phospha- 
tases (PTPs) plus 6 catalytic fragments and 12 additional fragments 
with high homology to the noncatalytic portion of other worm 
PTPs. In addition, there are 26 proteins classified as dual-specificity 
phosphatases related to MAPK phosphatases, CDC14, PRL, PIRl, 
CDC25, myotubularins, or PTEN lipid phosphatases. We also 
identify two SBFl- and one STYX-reiated proteins that are related 
to myotubularins and MAPK phosphatases yet lack the catalytic 
cysteine motif. These proteins are predicted to be cataly tically inert 
yet may function as phosphoprotein-binding domains or anti- 
phosphatases (36). We also identify 11 inositol polyphosphate 
phosphatases and 65 serine-threonine phosphatases. Among the 83 
PTPs, there are 57 cytoplasmic PTPs and 26 receptor-like PTPs, 
most of which are worm specific, lacking clear human orthologues. 
Exceptions include worm orthologues of the cytoplasmic PTPs; 
SHP2, MEGl, and MEG2, and the receptor PTPs; and PTP6, 
PTP7, PTPjLt, PTP^ and IA2 (catalytically inactive). Overall, 
worms contain approximately the same number of tyrosine and 
dual-specificity protein kinases as they do tyrosine and dual- 
specificity protein phosphatases. This coordinate expansion in the 
eukaryotic lineage of both protein-tyrosine kinases and phospha- 
tases emphasizes the biological need to maintain tight regulation of 
tyrosine phosphorylation. Because of the large numbers of worm- 
specific PTKs (FER and KIN-15 families) and worm-specific PTPs 
(89%, or 66 of 74), it is tempting to speculate that these unique 
enzymes may regulate each other's activity, or function in the same 
signaling pathways. Precedence for such specificity comes from 
genetic data indicating that the CLR-1 receptor PTP attenuates 
EGL-15, an FGFR orthologue, signaling in worms (37). 

Conclusions. What does the worm genome sequence tell us about 
mammalian signal transduction? First, it has provided an ideal 
model to highlight the bioinformatics challenges that lie ahead with 
the human genome effort and allows us to test our analysis tools and 
database organization. Second, it lets us refine our expectations as 
to the diversity and absolute number of unique protein kinases that 
we can expect to find in the human genome. Based on our count of 
493 (411 convemional and 82 PK-like proteins) worm kinases, 
minus the ^191 kinases that appear to be worm-specific expansions 
of certain families such as the CKl, FER, and KIN-15 families, 
multiplied by the ««4-fold greater number of genes in humans 
compared with worms, we predict the human genome to contain 
^1,100 protein kinases (PTKs and serine/threonine kinases). A 
similar extrapolation predicts ^300 human protein phosphatases 
(PTPs, dual-specificity phosphatases, and serine-threonine phos- 
phatases). Because our current count of human protein kinases and 

1. 'Hie C elegans Sequencing Consortium (1998) Science 282, 2012-2018. 

2. Chervitz, S. A., Aravind, U, Sherlock, G., Ball, C A., Koonin, E. V., Dwighl, S. S., Harris, 
M. A., DolimkI, K., Mohr, S.. Smith, T., et al. (1998) Science 282, 2022-2028. 

?. Clarke, N. D. & Berg, J. M. (1998) Science 282, 2018-2022. 

4. Ruvkun, G. & Hoberl, O. (1998) Science 282, 2033-2041. 

5. Bingham, J., Plowman, G. D. & Sudarsanam, S. ( 1999) J. Cell. Biocheni. in press. 

6. Goffeau, A., Barrell, B. G., Bussey, H., Davi.s, R. W., Dujon, B., Feldmann, H., Galibert, R, 
HoheiseK J. D., Jacq, C, Johnston. M., et ai (1996) Science 274, 546, .563-567. 

7. Hunter, T. & Plowman, G. D. ( 1997) Trends Biochem. Sci. 22, 18-22. 

8. Garret, J. M. (1997) Afy/. Micrvkioi 4, 8098-8120. 

9. Knippschild, U., Milne, D. M., Campbell, L. E., DeMaggio, A. J., Chrlstenson, E., 
Hoekstra, M. F. & Meek, D. W. (1997) Oncogene 15, 1727-1736. 

10. Morgan. W. R. & Green wald, 1. (1993) Moi. Cell. Biol. 13, 7133-7143. 

11. Morgan, W. R. (1996) Worm Breeder's Gazetie 14, 27. 

12. Murakami, S. & Johnson, T. B. (1998) Ciirr. Biol. 8. 1091-1094. 

13. Rosalo R., Vellmaal, J. M., Groffen, J. & Heisierkamp, N. (1998) Mol. Cell. Biol 18, 
5762-5770. 

14. Metzstein, .M. M., Stanfield, G. M. & Horvili:, H. R. (1998) TmmLs Genet. 14, 410-416. 

15. Hanks, S. K.. Quinn. A. M. & Hunter, T. (1988) Science 241, 42-52. 

16. Downward, J. (1998) Ciirr. Opin. Cell Biol. 10. 262-267. 

17. Maehama. T. & Dixon, J. E. (1999) Trends Cell Biol. 9, 125-128. 

18. Bellanger, J. M., Lazaro, J. B., Dirlong. S.. Fernandez, A.. Lamb, N. & Debant, A. (1998) 
Oncogene 16, 147-152. 

19. Sun, Z., Hsiao, J., Fay, D. S. & Slern, D. F. (1998) Science 281, 272-274. 

20. Matsuoka, S., Huang, M. & Elledge, S, J. (1998) Science. 282, 1893-1897. 

21. Meneghini, M.D., Ishttani, T., Carter, J. C, Hisamoto, N., Ninomiya-Tsuji, J..Tliorpe,C. J., 
Hamill, D. R., Matsumoto. K. & Bowerman, B. (1999) Nature (London) 399, 793-797. 



phosphatases stands at ^600 and 130, respectively, we stilJ have 
about half the work ahead of us. However, recent claims predict the 
human genome may contain as many as 140,000 genes, compared 
with previous estimates of «-80,000. Such calculafions would result 
in a significant inaease in our predictions of the total number of 
human protein kinases and phosphatases. 

We may expect to see less evolutionary expansion of protein kinases 
families that serve elemental cellular functions such as cell cyde control 
and chromosome segregation, compared with processes involved in 
intercellular signaling or organogenesis. However, there is alread>' 
evidence for at least a 2-lbld expansion in the number of CDKs and 
"mitotic kinases" from worms to humans. Unlike expressed sequence 
tag data mining and PCR-based gene discovery approaches, genomic 
strategies do not bias against genes whose expression is tightly regulated 
in a ceD-, developmental-, or disease-specific manner. This point is 
highlighted by the identification of 650 seven-transmembrane chemo- 
receptors in the worm genome (1), many of which may be expressed 
exclusively in single neuroas. Because worms have only «=302 neurons, 
compared with one trillion in humans, it would not be surprising to see 
this selectivity in cellular expression corroborated on min ing the human 
genome. Indeed, because many of these novel protein kinases are likely 
to exhibit highly restricted expression, they may prove to be excellent 
targets for drug discovery in the battle against human disease. 

The worm serves as a much simpler and tractable organism than 
humans for deciphering signaling cascades. Although their P.Tyr- 
signaling system is quite mature — based on the content of protein- 
tyrosine kinases, phosphatases, and adaptor molecules — they lack 
much of the molecular redundancy that exists in mammals, allowing 
the geneticist, biochemist, and cell biologist to more readily gen- 
erate an "outline" of the signaling pathways that are conserved 
between worms and humans. The availability of the complete worm 
genome provides a unique opportunity to learn about human 
biology. Predicted orthologous pairs of human and worm genes can 
be targeted by using reverse genetic approaches to identify new 
signaling partners or biologic functions that can then be biochem- 
ically and functionally verified in mammals. 

Although worms and humans have much in common, they also have 
obvious differences. Worms do not have limbs or bones, or a drculatory 
or immune system, and they eat only bacteria. Not surprisingly, they lack 
several protein kinases present in humans. Over the next 2 years, we 
should be better able to define which protein kinases are required for 
these specialized functions as the genome sequences of Drosopliila and 
humans are completed. Identification and classification of the proteins 
present in each is just a first step toward understanding the biological 
complexity of life. 
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